Labelizer: systematic selection of protein residues for covalent fluorophore labeling

Gebhardt, Christian; Bawidamann, Pascal; Spring, Anna-Katharina; Schenk, Robin; Schütze, Konstantin; Moya Muñoz, Gabriel G.; Wendler, Nicolas D.; Griffith, Douglas A.; Lipfert, Jan; Cordes, Thorben

doi:10.1038/s41467-025-58602-y

Download PDF

Article
Open access
Published: 04 May 2025

Labelizer: systematic selection of protein residues for covalent fluorophore labeling

Nature Communications volume 16, Article number: 4147 (2025) Cite this article

7180 Accesses
1 Citations
14 Altmetric
Metrics details

Subjects

Abstract

Attaching fluorescent dyes to biomolecules is essential for assays in biology, biochemistry, biophysics, biomedicine and imaging. A systematic approach for the selection of suitable labeling sites in macromolecules, particularly proteins, is missing. We present a quantitative strategy to identify such protein residues using a naïve Bayes classifier. Analysis of >100 proteins with ~400 successfully labeled residues allows to identify four parameters, which can rank residues via a single metric (the label score). The approach is tested and benchmarked by inspection of literature data and experiments on the expression level, degree of labelling, and success in FRET assays of different bacterial substrate binding proteins. With the paper, we provide a python package and webserver (https://labelizer.bio.lmu.de/), that performs an analysis of a pdb-structure (or model), label score calculation, and FRET assay scoring. The approach can facilitate to build up a central open-access database to continuously refine the label-site selection in proteins.

Super-resolution proximity labeling with enhanced direct identification of biotinylation sites

Article Open access 09 May 2024

Learning functional properties of proteins with language models

Article 21 March 2022

Machine learning for functional protein design

Article 15 February 2024

Introduction

Microscopy and spectroscopy techniques are ubiquitously used in the life sciences, in biophysical and medical assays to investigate the structure, interactions, and dynamics of macromolecules and their complexes down to the single-molecule level^1,2,3,4,5. Many applications require specific labeling of the biomolecule of interest with fluorescent probes^{6,7,8,9,10,11,12}. Whereas fluorescent proteins are the first choice for imaging applications in live-cells^13,14,15, synthetic organic fluorophores (dyes) are often used for high-sensitivity applications including single-molecule detection^16,17,18 and super-resolution microscopy^19,20,21. A common strategy for the (covalent) attachment of functional probes to proteins, including dyes, EPR spin probes, nanoparticles, and reactive surfaces is via reactive linker moieties^6,22.

A range of labeling strategies exists that exploit reactive groups, each with unique (dis)advantages. Coupling to amino groups in lysine residues can be achieved via N-hydroxysuccinimide (NHS)-esters, but this approach lacks specificity because of the abundance of lysine residues in proteins²². Alternatively, a terminally located His-tag or the N-terminus of the protein itself can be used for selective attachment of functional probes, with the disadvantage that the choice of labeling position is greatly curtailed²². In contrast, peptide tags (e.g., CLIP, SNAP, Halo, etc.) can facilitate covalent or enzymatic probe attachment (AP-BirA, LPXTG-SortaseA, etc.) at any desired location, but the size of tags limits applications and can impact protein function²³. The most widely used strategy for site-specific labeling of proteins is, therefore, to introduce non-native cysteine residues and to label their sulfhydryl-moiety via a maleimide-conjugate of the selected probe^22,24. Cysteine residues can be labeled with minimal effects on protein structure and function. Alternatively, unnatural amino acids (UAAs) can be introduced as targets for labeling. UAAs have proven particularly useful in cases where the removal of native cysteines is not possible due to their relevance (or abundance) and for live-cell labeling, where too many different proteins with cysteine residues are present^{25,26,27,28,29,30}.

The introduction of cysteine residues or UAAs have become the methods of choice for many spectroscopic and microscopic studies of proteins, including the characterization of structural and functional dynamics by single-molecule Förster resonance energy transfer (smFRET)^28,31,32 or pulsed electron-electron double resonance spectroscopy (PELDOR or DEER)^33,34,35,36. Therefore, the ability to select optimal labeling sites for the introduction of suitable probes has grown in importance^37,38,39. Currently, labeling sites are typically selected based on manual inspection of the protein structure in a lengthy trial and error process to identify labeling sites via physicochemical intuition that are not essential for protein structure or function^{40,41,42,43,44,45,46,47,48,49}, but that are also compatible with the assay requirements, e.g., for FRET to result in an inter-fluorophore distance close to the Förster Radius R₀^28,31,32. Frequently encountered problems when selecting a labeling site for fluorescent dyes (Fig. 1A) range from (i) influence of the fluorophore on protein properties, including altered biochemical function (Fig. 1A, “Protein”), (ii) low labeling efficiency (Fig. 1A, “Labelling efficiency”), or (iii) unwanted dye-protein interactions (Fig. 1A, “Dye Orientation”), to (iv) unpredictable or unfavorable photophysical properties of the dyes at the chosen site (Fig. 1A, “Spectroscopic Properties”). Suitable residues for labeling must not only enable specific and efficient attachment of fluorophores, but also avoid the problems summarized in Fig. 1A. Currently, the selection of labeling sites is often based on sensible rules of thumb⁵⁰ selecting those residues that satisfy assay requirements (e.g., distance constraints for FRET^51,52,53,54), but that are also solvent accessible⁵⁵, show low conservation scores²⁸ and are not related to protein function or the presence of fluorescence quenchers such as tryptophans^{50,51,53,54,56}.

**Fig. 1: Labelizer workflow to score protein residues for labeling and FRET experiments.**

Here, we introduce an automated analysis pipeline based on a naïve Bayes classifier^57,58 to select suitable label sites using information of protein structure and sequence, e.g., from the protein data bank, PDB (Fig. 1C, step 1). To systematically compare sites, we introduce a quantitative label score LS, which indicates the suitability of a protein residue to become a label-site, at which any of the problems shown in Fig. 1A are minimal. We assembled a database of publications that report the successful labeling of protein variants used in biophysical assays and identified an ideal set of parameters to allow the ranking of such residues (Fig. 1C, step 2/3). LS can be calculated independently of the choice of the label (fluorophore, EPR probe, beads, surfaces, etc.), yet we here focus on the use and characterization of LS for the attachment of fluorescent dyes to proteins. We also extended our analysis to pairs of residues for FRET assays, where the interdye distance should be close to the Förster radius to obtain the highest sensitivity (Fig. 1C, step 4). Therefore, we score different residue pairs according to LS and simulated distances to obtain an optimal FRET assay, which express the suitability of a residue pair as a FRET score. We support the predictive power of the LS and FRET scores with data from the literature and experiments on substrate-binding proteins (SBPs)^59,60,61.

To make the analysis routine available to a large community of researchers, we introduce a python package called “labelizer”, which implements our analysis of protein structures, label score calculation, and FRET assay scoring. The labelizer package allows researchers to build on our findings and adapt the code for their specific needs. For straightforward use, we also provide a webserver (https://labelizer.bio.lmu.de/) with a user-friendly interface to apply our analysis approach without any programming efforts.

Results

Database of successfully labeled residues

As the basis of our label-site selection tool, we created a database of proteins that have been successfully labeled with fluorophores. A large set (> 1000) of peer-reviewed papers and preprints was screened for labeled cysteine or UAA residues in proteins. We include protein residues in the database that have been covalently and site-specifically labeled at cysteines (predominantly) or UAAs with organic fluorophores. Note that we also included some spin labels or biotin-linked fluorophores, yet these represent < 5% of all labels in the database (see Supplementary Fig. 1). Furthermore, only residues are included for which the structure of the protein has been deposited in the PDB. For the included proteins, we extract information on the labeled residue (chain, number), the type of mutation used for labeling (cysteine or UAA), the assay type (e.g., single fluorophore assays, smFRET assay with two labels, imaging, bulk FRET, etc.), and the type of label. We then gathered additional information on the protein, such as its oligomeric state (monomer, dimer, complexes), whether the protein structure has been experimentally determined or only a homology model is available, and whether it is a soluble or a membrane protein. Overall, we identified labeled residues in > 100 different proteins from > 100 publications (see Supplementary Data: Reference Database Labelizer). An overview of the data and summary statistics are presented in Supplementary Fig. 1.

We used a standardized pre-processing routine (see Methods and Supplementary Note 1) to extract all relevant residues from the pdb-files of the proteins in the database. The final data set from 104 pdb structures contains 43357 residues, 396 of which are reported to have been successfully labeled (the other residues are considered unknown). For all residues in our database, we compute multiple parameters that can be assigned to one of the four major categories (Fig. 1B): (i) conservation score CS (ii) solvent exposure SE, (iii) secondary structure SS, and (iv), amino acid similarity of the exchanged residues to a cysteine, which we abbreviate as cysteine resemblance CR (see Supplementary Note 1 with Table 1–6). The parameters are either directly extracted from the residues in question, e.g., amino acid type, mass, charge, and size, or calculated with the help of freely available software (conservation score (ConSurf^62,63), solvent exposure (DSSP⁶⁴, HSE⁶⁵, MSMS⁶⁶), and secondary structure (DSSP⁶⁴)). Altogether, we obtain 28 parameters for each residue.

Bayesian approach to the prediction of labeling sites

To identify suitable residues for labeling, we are interested in P(l | s), the conditional probability that the residue can be labeled given a parameter value s. By Bayes’ law

$$P({l|s})=\frac{P\left({s|l}\right)}{P\left(s\right)}P(l),$$

(1)

$P(s)$ is the probability distribution of the parameter values $s$ over all residues, whether or not they can be labeled, while $P({s|l})$ is the probability distribution of the parameter values $s$ given that the residue can be labeled. Finally, $P(l)$ is the a priori probability that a residue can be labeled. While $P\left(s\right)$ and $P({s|l})$ can be readily computed from our database of labeled protein structures, $P(l)$ is harder to assess since the literature is biased towards reporting successful attempts of labeling that have provided relevant insights. Since P(l) only scales the final probability and does not affect the predictions of the relative ease of labeling for different residues, we decided to here use a simplified parameter score

$${PS}(s)=\frac{P({s|l})}{P(s)}$$

(2)

instead of P(l | s) to assess the suitability of residues for labeling. PS(s) is, in essence, the odds ratio for a given parameter value to occur in a labeled residue compared to randomly selected residues. For all 28 parameters, we computed $P({s|l})$ distributions for the 396 successfully labeled residues and $P(s)$ distributions from all 43357 residues of the 112 chains of the database (Fig. 2A and Supplementary Figs. 2, 3).

As a control, we compared the probability distributions $P(s)$ from our database of successfully labeled residues with the distributions computed for a random selection of protein chains from the PDB (PDBselect, November 2017)^67,68 (see “Methods”). Here, we find only minor differences, indicating that the protein parameters in our database are representative of the overall PDB content (Supplementary Fig. 2). One notable difference is that cysteines are much less abundant (by ~ 50%) in the database of labeled proteins compared to the overall PDB, suggesting that cysteine insertion and labeling is easier (or at least more common) for proteins with fewer native cysteines (Supplementary Fig. 2). Although we also included residues that were labeled via UAA incorporation, our database indicates that cysteine labeling is still the predominant strategy for proteins since it makes up ~ 90% of all labeled residues in our database (Supplementary Fig. 1D).

We find clear differences between P(s | l) and P(s) and, therefore, non-uniform PS distributions for most of the investigated parameters (Fig. 2A, C and Supplementary Fig. 6), showing that they indeed contain information about the suitability of residues to serve as label sites. To evaluate which parameters are most predictive, we computed PS distributions for 28 parameters (numbered from #1 to #28) from all four categories from our database (Fig. 2 and Supplementary Table 1, 4). For each PS distribution, we analyzed their mean-square deviation from an equal distribution, the Gini coefficient, and the Shannon entropy (see Supplementary Note 1 and Supplementary Table 6). We find that the PS distributions for many parameters clearly deviate from an equal distribution and contain significant information (low Shannon entropy), e.g., seen in #1: relative surface area (Wilke), #4: first half-sphere exposure (10 Å), #16: variant length in homologs (see Supplementary Fig. 3). Other parameters contain barely any information such as #17 cysteine in homologs (yes/no), or #27 amino acid charge (Supplementary Fig. 3). Thus, strikingly, it is largely unpredictive for labeling of a residue whether a cysteine is found in one of the homolog proteins at the same position or whether the residue is charged (see parameter #17 and #27, Supplementary Fig. 3). One might have expected that residues with cysteine homologs are easily mutated to cysteines, and therefore, significantly enhanced in our scoring, which is not the case.

After establishing the predictive power of individual parameters, we investigated what combinations of parameters should be used. For this, we calculated the correlation between all parameters to judge their statistical independence, which is desirable for our Bayesian analysis (Fig. 2B). Since we deal with categorical data (e. g. secondary structure) and numerical data (e.g., relative surface area), we used Pearson correlation, interclass correlation and Cramer’s V for the combinations of numeric-numeric, categorical-numeric, categorical-categorical values, respectively (see Methods for details). We formed sets of four parameters and used a correlation measure (2-norm of all paired correlations, see Methods) to calculate a combined correlation estimator for all combinations of parameters (Supplementary Fig. 4). Whereas this combined correlation-derived measure shows higher values for most combinations of two or more parameters within the same categories CS, SE, CR, and SS, the correlation of combinations of parameters from different categories was smaller (< 0.5). This effect was independent of whether parameters with high or low predictive power (MSD / Shannon entropy) were combined (Fig. 2B and Supplementary Fig. 4). The overall low correlation between parameters from different categories justifies our categorization and their consideration as independent variables if we restrict our selection to one parameter per category. The strong correlation within categories also suggests that the choice of the particular parameter from one category is not critical, i.e., most of the parameters can account for the properties of the respective category.

The combined label score predicts potential labeling sites

To combine parameter scores into a final assessment of a given residue to serve as a label site, we introduce a combined label score, LS. By standard probability theory, different parameters s_i can be combined by

$$P\left( l |{\bigcap }_{i=1}^{n}{s}_{i}\right)={\prod }_{i=1}^{n}P({l|}{s}_{i})=\frac{{\prod }_{i=1}^{n}P\left({s}_{i} | l\right)P\left(l\right)}{{\prod }_{i=1}^{n}P\left({s}_{i}\right)}$$

(3)

under the assumption that they are independent, where Π denotes the product and ∩ the intersection. This naïve Bayes classification^57,58 is known to give good predictions for low and moderately correlated parameters^69,70,71,72, which is the case for our parameter set (Fig. 2B). In general, any residual correlation alters the calculated probability values towards the extremes of 0 and 1⁷². However, we again use parameter scores as comparative figures without the meaning of probabilities and combine the PS_i into the combined label score by taking their geometric mean:

$${LS}=\root{n}\of{P{S}_{1}\cdot \ldots \cdot P{S}_{n}\,}$$

(4)

An important question is which of the 28 parameters to include in the LS. We include one parameter from each of the four categories CS, SE, SR, and SS, for which concrete values were mapped onto the structure of the phosphate binding protein PBP (Fig. 3A). For a rational selection of parameters, we strive (i) to maximize the dynamic range of values for LS, (ii) to maximize the enhancement/suppression level of LS of the successfully labeled residues in the database for high/low LS values and (iii) to maximize the statistical significance level of LS values of random residues over LS values of the labeled residues in the database.

**Fig. 3: Visualization of parameter and label scores.**

Based on these criteria, we were able to identify several parameter sets with predictive power (Supplementary Fig. 5A, B), but also combinations with much less information (Supplementary Fig. 5C). In the end, we decided on one set that resulted in a large difference of the distributions between the random and labeled residues: mean surface distance (SE, #11), conservation score (CS, #13), secondary structure of the labeled residue (SS, #18), and the nature of the mutated amino acid (CR, #25). This set is shown in Fig. 3C and is used as the default for LS calculations in this manuscript and for the associated web server. In the labelizer Python package, any parameter combination can be selected.

We chose the default set out of all well-performing combinations, because of the intuitive nature of all selected parameters and the maximized differences between the mean LS values of all vs. the labeled residues. Both our choice of parameters and the selected number of categories to four (and not only two or three) are supported by statistical analysis of the significance, i.e., a t test and a comparison of the mean values of all vs. labeled parameters for different parameter combinations (Supplementary Table 7). Our selection is further validated by comparing the receiver operating characteristic (ROC curve) for the baseline when retraining with one of four scores removed and the predictive power of each of the scores on their own (Supplementary Fig. 6). Bootstrapping of the final set demonstrates the robustness of our analysis (Fig. 3C). For this final set of parameters, we find that the label scores LS range from 0.2 to 2 for most residues (except 5% failed calculations with LS = 0). The ratio of the LS distribution of successfully labeled residues in the database and all label scores shows that high label scores (> 1.5) are significantly enhanced by a factor of ~ 3-4 for the labeled residues, whereas low label scores (< 0.5) are suppressed by a factor of ~ 10 (Fig. 3C). This suggests that the label score is an informative measure to rank and compare residues for their suitability for labeling with fluorophores. We note here that it would be beneficial to compare the label scores of successfully labeled residues with non-successfully labeled residues in the future. However, we do not have information on non-successfully labeled residues, and only 1% of the considered residues (396 out of 43357) are known as labeled, which should not affect the comparison significantly. We visualize the calculated LS scores for three typical proteins, comprising a soluble protein, a membrane protein, and a DNA binding protein (Fig. 3D–F).

Experimental benchmarking of the label score

To characterize the relation between LS values and experimentally observed behavior, we performed two different analyses of variants of the maltose binding protein (MalE) with single-cysteine labeling sites. MalE is a soluble bilobed protein with an open (apo) and a closed (holo) structure^61,73, which serves as the periplasmic component of the bacterial ABC importer MalFGK₂-E⁷⁴. We visualized LS values for all sites of apo MalE in Fig. 4A and the corresponding distribution in Fig. 4B. The distribution shows an LS value range between 0 and 2; high values for LS appear mostly in positions of MalE near the surface when LS are mapped back to the structure (Fig. 4A).

**Fig. 4: Characterization of LS values with experimental parameters and degree of labeling (DOL) for single cysteine variants of MalE.**

First, we studied a data set of 20 variants of MalE, partially taken from previous work with (relatively) high LS scores representing residues that are good candidates for labeling according to our approach. For these experiments, we used the dyes Alexa546, Alexa647, Cy3, Cy5, sCy5, and ATTO647N and obtained an average degree of labeling (DOL) of 0.82 overall samples after protein labeling and SEC purification (Fig. 4). The DOL was determined using the molar ratio between fluorophore and protein concentration from the Lambert-Beer law: DOL = c(fluorophore)/c(protein). All successfully labeled sites have an average of ~ 1.4, and almost 90% of them showed LS values > 1 (Fig. 4B, C). The distribution of the label scores for the successfully labeled sites is different from the distribution of all residues of the MalE protein (Fig. 4B), again confirming that LS provides valuable information about the suitability of protein residues to act as label sites. Our analysis shows, however, no correlation between LS and the experimentally determined DOL (Fig. 4C). This is not too surprising since all tested residues have relatively high label scores, and we focused on mutants with a reasonable chance of labeling and did not include measurements e.g., of buried residues with low label scores. Furthermore, we do not observe systematic differences between different dyes, suggesting that our method works robustly and is independent of the fluorophore (Fig. 4D and Supplementary Data; LS vs. DOL).

In a second set of experiments, we ranked all MalE residues by their label scores and then randomly selected 5 variants each from the best 10% LS scores (referred to as “positive control”) and 5 residues from the worst 10% LS scores (“negative control”). For each of these 10 variants, we characterized the effect of the cysteine mutation in terms of the protein’s expression yield and DOL using the dye sCy5 (all data are provided in the Supplementary Data Excel file and Table 1). All positive controls, i.e., MalE variants comprising residues with high label scores, expressed with high yields (> 15 mg from a 2 L expression culture) and could be labeled with a DOL > 85%. These findings again support that residues with high LS scores can be successfully expressed and labeled, in line with the first analysis of MalE point mutations (Fig. 4).

Table 1 Overview of expression and labeling properties of randomly selected MalE variants

Full size table

In contrast, two of the negative control variants showed reduced expression yield (L7C with 7.8 mg) or no expression at all (W94C). Furthermore, two of the obtained four negative control variants showed DOL values < 2%. Interestingly, the other two negative control variants showed good expression yields and adequate DOL values, suggesting that not all residues with low LS scores are necessarily unsuitable for labeling. Taking all variants from this set of 10 MalE variants into account, there is a statistically significant correlation between LS score and DOL (p = 0.03 from a two-sample t-test), further supporting the approach presented here.

Extension of the LS score to FRET experiments

To test our prediction tool for the design of a concrete biophysical assay, we extend it to FRET experiments. For this we combine the label score LS with an additional parameter for the rational design of FRET experiments. The central idea is to select residue pairs for FRET experiments that are (i) suitable as label sites based on LS, (ii) are separated by a distance that is close to the Förster radius of the dyes used (for maximum sensitivity) and (iii) that can detect conformational motion. Criteria i/ii are relevant to the case where one protein structure is available, and a residue pair is wanted with a distance close to the Förster radius of the dye pair. In this scenario, the researcher can use combinations of residues in different domains of the protein for maximal sensitivity. We define the FRET score FS of a residue pair {i,j} for a single protein structure as:

$${FS}=\sqrt{L{S}_{i}L{S}_{j}}\cdot \left(1-2\left|\frac{1}{2}-{E}_{i,j}\right|\right),$$

(5)

FS considers the label scores LS_i and LS_j of two residues $i$ and $j$ in the protein structure with corresponding predicted FRET efficiency ${E}_{i,j}$ (see Supplementary Note 2 for details on the FRET efficiency prediction). FS is highest for residue pairs with predicted ${{{{\rm{E}}}}}_{{{{\rm{i}}}},{{{\rm{j}}}}}\,=\,0.5$, i.e., an interdye distance similar to the Förster radius of the dye pair.

If two (interconverting) structures of a protein are available, one is interested to find FRET pairs that show the largest possible shifts in FRET efficiency. This scenario is encountered when ligand binding, protein-protein interactions, or other macromolecular interactions are studied and requires that distinct structures of the same protein, e.g., ligand-free and ligand-bound, are available. We define the FRET difference score $F{S}_{\triangle }$ of a residue pair {i,j} for two available structures $A$ and $B$ of the same protein as

$$F{S}_{\triangle }=\sqrt{L{S}_{i}^{A}L{S}_{i}^{B}}\sqrt{L{S}_{j}^{A}L{S}_{j}^{B}}\cdot |{E}_{i,j}^{A}-{E}_{i,j}^{B}|,$$

(6)

with the label scores LS of two residues $i$ and $j$ in two protein structures $A,B$ with their corresponding FRET efficiencies ${E}_{i,j}^{A}$ and ${E}_{i,j}^{B}$, respectively.

Accessible volume calculations for FRET labels

To rationally establish a FRET assay with maximum sensitivity, it is necessary to operate at interprobe distances around the Förster radius. A crucial step for the calculation of both FRET scores is, therefore, the ability to predict interdye distances from the protein structures accurately (Fig. 5). The labelizer package supports three models for in silico fluorophore distance predictions. A rough approximation of expected FRET efficiencies can be obtained from the C_β distances between two residues⁵⁵ (Fig. 5A). However, these distances can differ > 10 Å from the actual mean fluorophore positions, due to the size of the fluorophore and the flexible linkers (10–20 Å length) used for fluorophore attachment^75,76. While distance changes are less impacted by such deviations, the absolute distances are significantly affected by the geometry of the labels (Fig. 5C). Neglecting these effects can reduce the sensitivity of a FRET assay by up to a factor of ~ 4 (Fig. 5D and Supplementary Figs. 7, 8).

**Fig. 5: Accurate prediction of interdye distances on proteins and experimental benchmarking of the FRET scores.**

To predict distances between fluorophore labels accurately, it is important to obtain accurate simulations of the accessible volumes (AVs) considering the size and shape of the dyes and their linkers. Molecular dynamics simulations have been successfully used for this purpose^77,78,79, yet they are too slow as a screening tool. Coarse-grained simulation via FRET-restrained positioning and screening system (FPS), where all positions on a grid are examined to decide whether it can be occupied by a fluorophore of the specified size and linker length, provide AVs that are in very good agreement with experimental values of interdye distances^{49,76,80,81,82,83,84,85,86} (Fig. 5A). Comparing the calculated C_ß-distances of the residues with FRET-averaged distances ${R}_{ < E > }^{{model}}$ from AV simulations reveals deviations of 10 to 15 Å (RMSD, Fig. 5B and Supplementary Fig. 8A), highlighting the need to consider the dye and linker geometry. The computation time required for one pair of dyes using FPS, however, is still rather long for screening purposes, e.g., several hours when > 10.000 residue pairs should be considered (see Supplementary Table 9).

Therefore, we here introduce a simpler and faster distance estimator based on a spherical sector model (SSM) that estimates dye-accessible and dye-inaccessible volumes (Fig. 5B). SSM is used for screening purposes since it is 100 to 1000 times faster than currently available simulations such as FPS. Our algorithm relies on an approximation of the accessible volume by a spherical sector of angle $\alpha$ and radius R representing the linker length of the fluorophore (see Fig. 5C). The atoms of the protein within a radius $R$ from the attachment site (C_ß atom) define an inaccessible volume (see Fig. 5B, C, pale red spheres). We find a direct relation between the center of mass of these atoms $\vec{d }^{\prime}$ (inaccessible volume) and the center of mass of the accessible volume $\vec{d}$ (see Supplementary Note 2) as

$${d}^{ \rightharpoonup }=\left(1-\frac{3}{4}\frac{R}{\left|{d}^{ \rightharpoonup }{\prime} \right|}\right){{d}^{ \rightharpoonup }}^{{\prime} }.$$

(7)

We included a small correction $\varepsilon$ (~ 0.5 Å for typical fluorophores) to the linker length $\widetilde{R}=R+\varepsilon$ in this formula to compensate for the size of the fluorophore core (Supplementary Note 2 and Supplementary Fig. 7), and we used an estimation to convert the distance of the mean positions to FRET-averaged distances (Supplementary Note 2, Supplementary Fig. 8). To test our method, we performed distance simulations for 100 donor-acceptor pairs in 10 different protein structures, where we altered the linker length and the dye dimension with 35 variations resulting in 35.000 distance simulations in total. Our SSM approach gives results in good agreement with the FPS method with a deviation of ± 3 Å (RMSD, Fig. 5 and Supplementary Fig. 7), which is on the order of the intrinsic distance precision of FRET⁷⁶. The mean-position distances are converted to FRET-averaged distances with an exponential correction factor at small distances (see Methods and Supplementary Fig. 8). The spherical sector method allows to screen > 10.000 FRET-pairs within seconds on a single CPU with < 1 ms calculation time per residue-pair (see Supplementary Table 9). Therefore, our standard settings are to use the SSM method for a first selection of suitable FRET-labeling positions and subsequently refine the best three hundred FRET pairs with the FPS AV-simulations^80,86. Alternatively, our Python package allows calculating the C_ß distances (low accuracy) or the FPS-derived derived distances (long runtime) for all residues by manual selection.

Experimental benchmarking of the FRET score

At first, we used the labelizer workflow to establish FRET assays for mechanistic studies of the ABC transporter-related prokaryotic substrate-binding protein PBP^59,60,61,87 (Fig. 6A). As seen in the crystal structures, PBP undergoes a ligand-induced transition from a ligand-free open (pdb: 1OIB, apo) to a ligand-bound closed state (pdb: 1PBP, holo; Fig. 6A). Yet, the ligand binding mechanism of PBP, i.e., ligand-binding before conformational change (induced fit) or conformational change before ligand binding (conformational selection) has not been studied. Thus, our goal was to obtain assays with large changes in FRET efficiency upon the addition of the ligand inorganic phosphate for dye pairs with a Förster radius around 5 nm. We identified multiple suitable residue combinations with maximized positive and negative distance changes based on $F{S}_{\Delta }$ (Fig. 6B). We selected four double cysteine variants with large predicted shifts from long (low FRET) to shorter (higher FRET) distances upon phosphate binding. We selected those from the list of 300 refined pairs using the FPS parameters for Alexa Fluor 555-Alexa Fluor 647 (Fig. 6B and Supplementary Table 8). Before conducting FRET experiments, we characterized one of the double-cysteine variants PBP (S3C-I76G-P86C) and the cysteine-less PBP variant PBP (I76G) biochemically by ITC and obtained K_d-values of 10 ± 5 µM for PBP (S3C-I76G-P86C) and 19 ± 6 µM for PBP (I76G); mean ± SD from n = 2 protein preparations (Supplementary Fig. 9). These experiments suggest that protein labeling does not affect substrate affinity. The I76G mutation was used in all PBP protein variants presented in this paper (Fig. 6 and Supplementary Fig. 9).

**Fig. 6: Labelizer-based residue selection for FRET experiments and validation.**

Subsequently, we labeled all four PBP variants using established procedures^39,88 (see Methods) and studied freely diffusing molecules with microsecond alternating laser excitation spectroscopy (µsALEX). For labeling, we used the donor-acceptor pair Alexa Fluor 555-Alexa Fluor 647 and the structurally related combination LD555-LD655 (Fig. 6D, E). The success of the labelizer prediction is seen in Fig. 6E and Supplementary Fig. 9, where high-quality smFRET histograms are obtained for all four PBP variants with low FRET in the apo (open conformation, no phosphate) and high FRET in the holo state (closed conformation, 480 µM phosphate). Analyzing the shift of the open to closed conformation and plotting the closed-state fraction as a function of ligand concentrations for PBP (S3C-I76G-P86C) with Alexa555-Alexa647 yields a K_d of 16 ± 6 µM (Fig. 6F), which is in agreement with results for the unlabeled proteins (Fig. 6C). A similar behavior for the FRET-assay properties and biochemical characteristics are found for all PBP variants (Supplementary Fig. 10).

Beside the demonstration of the success of the labelizer procedure, these experiments provide so far unavailable information on the ligand binding mechanism of PBP. The lack of a pronounced closed-state population in the absence of ligand (Supplementary Fig. 9, apo) and the engulfed nature of the ligand in the closed state support the idea that PBP is likely to use a ligand binding mechanism like other structurally-related SBPs^60,61,73.

To go beyond a qualitative assessment of the labelizer routine, we analyzed a large pool of smFRET experiments of different MalE double-cysteine variants to quantitively benchmark the scores FS and $F{S}_{\Delta }$. In detail, we analyzed 34 data sets of published^39,80 data and new data generated for this study (Supplementary Figs. 11, 12). For these the accurate FRET efficiencies of MalE in both apo- and holo-state are determined, including the respective interprobe-distances and their distance change upon maltose binding (Supplementary Data). This data set covers an experimental interprobe distance range from 3–7 nm for three distinct dye pairs with Förster radii of 5.1 nm (Alexa Fluor 555-Alexa Fluor 647), 5.8 nm (ATTO532-ATTO643) and 6.5 nm (Alexa Fluor 546-Alexa Fluor 647) and E values ranging from 0.2–0.9. An overview of experimentally determined and theoretical Förster radii is provided in Supplementary Fig. 11D.

Consistent with expectations, the calculated FS values correlate linearly with the difference of the experimentally determined mean FRET efficiency from 0.5 (Fig. 5E). Similarly, we observe a linear correlation between computed $F{S}_{\Delta }$ values and the experimentally observed change in FRET efficiency |E_holo-E_apo| upon ligand binding (Fig. 5E). Whereas pairs with large FS and $F{S}_{\Delta }$ values are desirable to detect changes upon ligand binding, pairs with high FS values of the two individual conformations, but $F{S}_{\Delta }\approx 0$ (MalE 84/352, Supplementary Fig. 12), can provide an important experimental control. Such pairs have a distance close to the Förster radius with (almost) no change in FRET efficiency upon conformational change. They can serve as negative controls to ensure that a protein or conformational changes do not influence fluorophores, e.g., via altered photophysics, lifetime and quantum yield changes, or for the characterization of quenchers such as metal ions⁸⁹, which can affect FRET efficiencies without conformational change.

Importantly, all analyzed fluorophore-labeled MalE variants used for smFRET had LS values > 1 and showed maltose affinities that are wildtype-like with K_d-values around ~ 1-2 µM (Supplementary Fig. 11). Taken together these analyses provide strong support for the idea that the LS is a useful indicator to identify residues that (i) allow fluorophore attachment, (ii) preserve protein function and in combination with FS (iii) enable systematic design of FRET assays.

Discussion

Here, we present a general strategy to identify optimal residues for protein labeling using a naïve Bayes classifier. Based on a literature screening and bioinformatics analysis of 104 proteins with 396 successfully labeled residues, we identified a set of four parameters, which we combined into a label score to quantitatively rank residues according to their suitability as label sites. We show, using data from the literature and new complementary experiments, the predictive power of this labeling score and extend the method to systematically select residue pairs for FRET experiments, which we believe can be extended at a later stage to consider the specific properties of the label and also other biophysical assays beyond FRET.

To widely disseminate our methodology, we provide a Python package called “labelizer”, which implements the analysis of the pdb-structure, label score calculation, and FRET assay scoring. The labelizer analysis routine can be modified and extended, to accommodate specific research questions and to build upon the work presented here. To make the methodology widely available to non-expert users, all key functionalities are available as a web server with an intuitive and user-friendly interface https://labelizer.bio.lmu.de/. The web server supports the label score calculation and its use for FRET experiments with default parameters for the most frequently used fluorophores. For this purpose, pdb-files can be loaded automatically and preprocessed from the pdb-database. We further retrieve conservation scores directly from an independent installation of the ConSurf server^62,63 without the need of uploading any information (except when modified or user-specific pdb files should be used). The web server visualizes the different scores in an interactive 3D structure viewer and provides a table with filter options for customized restrictions upon residue selection. Furthermore, human-readable result files (csv, json) are available for subsequent analysis. With the developed method, we hope to provide scientists in various research fields (biochemistry, molecular biology, bioimaging, high-resolution optical microscopy, and single-molecule biophysics) with a tool that enables them to systematically design assays and justify the residue selection.

A challenging aspect of our analysis is the final selection of residues by the user based on the labelizer output. Since this step is decisive for which residues are used in experiments, the selection goes hand in hand with an assessment and interpretation of the LS/FS value distributions of the analyzed protein. It is difficult to define clear threshold values for residues to be excluded based on LS/FS, yet our findings empirically suggest that residues with LS values < 1 are less likely to be useful in experiments. Since the FRET-score values additionally depend on the underlying LS distribution, it is difficult to give general recommendations. We stress that the user of the algorithm should inspect the specific LS/FS distributions for each protein. For the residues ranked highest, we recommend the user to verify this selection with prior (expert) knowledge on the protein. A key question would be whether the highly-ranked residues, i.e., those favored by the labelizer, are known to negatively impact secondary structure, ligand-binding, biomolecular interactions, or protein folding. Additional information might also come from other biophysical approaches such as CD spectroscopy, FTIR, MD simulations or EPR studies, considering any information that can help to assess if key residues, which should not be altered, are actually (falsely) suggested by our algorithm.

An interesting future direction for further development of the labelizer is to include more parameters (e.g., also fluorophore-dependent ones) with a potential differentiation of residues based on the selected fluorophores related to the specific charge environment on the protein or proximity to specific amino acids, e.g., tryptophane or histidine. We also plan to combine different parameter scores to improve the predictive ability of the labelizer, which might happen within one category, e.g., via simultaneous use of half-sphere exposure (HSE) and relative surface area (RSA) to combine the amino-acid direction and surface area or between categories, e.g., solvent exposure and cysteine resemblance. Furthermore, normal mode analysis (e.g., NMSim webserver^90,91), mutation-specific energy analysis (e.g., SDM^92,93), or tailored MD-simulations⁹⁴ could be used to identify FRET-residue pairs for analysis of conformational motion when only one protein structure is available. The concept of FRET scores could be also extended towards other fluorescence assay types related to fluorophore quenching^95,96, protein-induced fluorescence enhancement^97,98, and others^99,100. We also envision applying the labelizer approach in related applications, such as EPR-distance measurements, since the methods share similar requirements in regard to residue selection^37,38,39.

Another direction for future improvement and extension of the database and the algorithm would be to revise the available PS values by an extended database, where particularly positions with low or no yield of labeling, could be an important new class of information. Such an improved training data set can be obtained via a feedback loop, where researchers supply information on successfully and unsuccessfully labeled residues via a form planned on our website. Unsuccessful results are of particular interest since negative results are rarely found in the literature (mainly successful results are published), and we were not able to collect enough negative examples from researchers directly. Therefore, we call on the scientific community to use the labelizer and to provide feedback on the approach and on positive and negative results, where labeling of specific residues was successful or failed, respectively. Finally, once a much larger dataset of labeled and non-labeled residues is available, applications of other machine learning procedures (e.g., support vector machine or neural networks) could significantly enhance the predictions.

Methods

Database generation

To identify parameters with predictive power for the possibility to label residues in proteins, we created a dataset based on a non-automated screening of more than 1000 publications published or preprinted, which were available on or before December 2020 with a focus on the field of single-molecule microscopy and single-molecule FRET. The papers were screened to identify proteins and residues that were labeled successfully with a fluorophore and that satisfied the following criteria: (i) the proteins had a structure available in the PDB database (with PDB identification code); (ii) the protein was labeled via site-specific mutagenesis and introduction of cysteines or UAAs; (iii) the protein was successfully labeled synthetic organic fluorophores (or spin labels) and used preferentially single-molecule assays. In order to increase the number of database entries, we complemented our search whenever some information was missing. Typical cases were missing PDB identification codes or residue numbers. In this case, the required information was obtained from other referenced papers (often) of the same research group.

For each successfully labeled protein variant, which fulfilled the aforementioned criteria, the following information was collected:

Protein (PDB identification code)
Soluble or membrane protein
Stoichiometry (monomers, dimer, complexes)
Homology model (true/false)
Labeled residue (chain and residue number)
Mutation (cysteine or UAA)
Assay type (smFRET, imaging, bulk-FRET, other)
Name of labeled fluorophores
Research group
Publication reference

Additional notes were gathered to account for issues such as: (i) dimer and polymer protein structures, which were crystallization artefacts and needed to be deleted for structural analysis; (ii) missing residues in protein structure, i.e., when parts of the protein were not resolved completely; (iii) we identified inconsistencies or missing information. The final database with information on those positions in proteins that were successfully labeled had 396 successfully labeled residues in 112 different chains in 104 different protein structures (Supplementary Data). As a comparison, we used a representative set of proteins (PDBselect, November 2017)^67,68 as a random reference database to check how representative the analyzed pdb structures are. Therefore, we randomly selected 300 chains (out of 4184 chains) from the PDBselect database and performed the identical analysis with those pdb files. This important comparison shows that the selection of labeled proteins and residues is representative of the pdf content, indicated by only minor deviations between both ${{{\rm{P}}}}({{{\rm{s}}}})$ distributions, mostly within statistical errors (see Supplementary Fig. 2).

Parameter frequency calculation

For every extracted parameter, the relative frequency defines a parameter score

$${PS}=\frac{P({s|l})}{P(s)},$$

(8)

where $P(s)$ is the probability distribution of the score $s$ (calculated from the 112 chains of the database) and $P({s|l})$ is the probability distribution of the score given that the residue was labeled (calculated from the 396 successfully labeled residues).

The error bars ${\sigma }_{{sl}}$ and ${\sigma }_{s}$ for $P({s|l})$ and $P(s)$, respectively, were determined from Poissonian counting statistics as ${\sigma }_{{sl}}=\sqrt{P\left(s | l\right)/n}$ and ${\sigma }_{s}=\sqrt{P(s)/n}$ with $n$ being the total number of evaluated residues. The error bar ${\sigma }_{{PS}}$ for ${PS}$ follows from standard error propagation rules:

$${\sigma }_{{PS}}=\sqrt{\frac{{\sigma }_{{sl}}^{2}}{P{\left(s | l\right)}^{2}}+\frac{{\sigma }_{s}^{2}}{P{\left(s\right)}^{2}}}{PS} .$$

(9)

Parameter information analysis

To evaluate the amount of information a single parameter score inheres, we used three measures to estimate the deviation from an equal distribution, which corresponds to the case of zero information.

We used standard Pearson correlation for a pair of numeric parameters

$${MSD}({PS})=\frac{{\sum }_{i=1}^{n}{\left({PS}\left(i\right)-1\right)}^{2}}{n}$$

(10)

with n the number of bins/categories.

We used standard Pearson correlation for a pair of numeric parameters

$${gini}({PS})=\frac{\frac{n-1}{2}{\sum }_{i=1}^{n}{PS}\left(i\right)-{\sum }_{i=2}^{n}{\sum }_{j=1}^{i-1}{PS}\left(j\right)}{\frac{n}{2}\mathop{\sum }_{i=1}^{n}{PS}\left(i\right)}$$

(11)

with n the number of bins/categories.

We used an adapted Shannon entropy accounting for the number of bins/categories as

$$H({PS})=\frac{-\mathop{\sum }_{i=1}^{n}\widetilde{{PS}}\left(i\right){{\mathrm{ln}}}\left(\widetilde{{PS}}\left(i\right)\right)}{{{\mathrm{ln}}}({{{\rm{n}}}})}$$

(12)

with a normalized parameter score $\widetilde{{PS}}\left(i\right)={PS}(i)/\left({\sum }_{j=1}^{n}{PS}(j \, )\right)$ and n the number of bins/categories.

Parameter cross-correlation

To evaluate the mutual statistical dependence of all calculated parameters, we use three different types of correlation coefficients, depending on the datatypes of the parameters:

We used standard Pearson correlation for a pair of numeric parameters

$${r}_{{NN}}=\frac{\mathop{\sum }_{i=1}^{n}({x}_{i}-\bar{x})({y}_{i}-\bar{y})}{\sqrt{\mathop{\sum }_{i=1}^{n}{\left({x}_{i}-\bar{x}\right)}^{2}}\,\sqrt{{\sum }_{i=1}^{n}{\left({y}_{i}-\bar{y}\right)}^{2}}},$$

(13)

with $n$ different residues with parameter scores ${x}_{i},\,{y}_{i}$ and corresponding mean values $\bar{x}=1/n{\sum }_{i=1}^{n}{x}_{i}$ (and $\bar{y}$ accordingly)¹⁰¹.

We used the interclass correlation for a pair of a categorical parameter and a numeric parameter¹⁰². The $n$ data points are grouped in k categories ${c}_{i}{with\; i}\in \{{{\mathrm{1,2}}},\ldots,k\}$ of length ${n}_{i}$.

$${r}_{{CN}}=\frac{{MST}-{MSE}}{{MST}+\left({n}_{0}-1\right){MSE}},$$

(14)

with

$${MST}=\frac{{\sum }_{i=1}^{k}{n}_{i}\mathop{\sum }_{j=1}^{{n}_{i}}{\left(\bar{{x}_{i}}-\bar{x}\right)}^{2}}{k-1},$$

(15)

$${MSE}=\frac{\mathop{\sum }_{i=1}^{k}\mathop{\sum }_{j=1}^{{n}_{i}}{\left({x}_{i,j}-\bar{{x}_{i}}\right)}^{2}}{n-k},$$

(16)

$${n}_{0}=\frac{n-\mathop{\sum }_{i=1}^{k}{n}_{i}^{2}/n}{k-1},$$

(17)

where $\bar{{x}_{i}}$ is the mean of category $i$, $\bar{x}$ the mean of all data, ${x}_{i,j}$ the j^th numeric value in category ${c}_{i}$, and $({n}_{0}-1)$ the averaged interclass degree of freedom¹⁰².

We used Cramer’s V for a pair of categorical parameters¹⁰³. The data are grouped in the two categories ${c}_{i}{with\; i}\in \{{{\mathrm{1,2}}},\ldots,k\}$ and ${d}_{j}{with\; j}\in \{{{\mathrm{1,2}}},\ldots,l\}$.

$${r}_{{CC}}=\sqrt{\frac{{\chi }^{2}}{n(\min \left(k,l\right)-1)}},$$

(18)

with

$${\chi }^{2}=\mathop{\sum }_{i=1}^{k}{\sum }_{j=1}^{l}\frac{{\left({n}_{i,j}-{\widetilde{n}}_{i,j}\right)}^{2}}{{\widetilde{n}}_{i,j}},$$

(19)

where ${\widetilde{n}}_{i,j}=({\sum }_{j=1}^{l}{n}_{i,j})({\sum }_{i=1}^{k}{n}_{i,j})/n$, $n$ total number of residues and ${n}_{i,j}$ number of residues of class ${c}_{i}$ and ${d}_{j}$. The cross-correlation was calculated for every combination of the 28 extracted parameters to identify dependencies, as shown in Fig. 2.

Parameter selection criteria

The selection of a suitable parameter set is based on two criteria. First, a joined correlation for any combination of parameters is calculated as

$${r}_{{set}}=\sqrt{{\sum }_{i=1}^{n}{r}_{{ij}}},$$

(20)

with ${r}_{{ij}}$ the correlation of parameter $i$ with $j$ and $n$ the number of selected parameters (in our case, 4). Secondly, we used three measures to characterize our parameter sets:

We calculate the t value of the calculated label scores as

$$t=\frac{{\mu }_{l}-{\mu }_{{all}}}{\sqrt{{SE}{M}_{l}^{2}+{SE}{M}_{{alll}}^{2}}}$$

(21)

with the mean values ${\mu }_{l},\,{\mu }_{{all}}$ and standard error of the mean ${SE}{M}_{l},{SE}{M}_{{all}}$ of the labeled/all residues, respectively.

The dynamic range was calculated as the standard deviation of the logarithmic values $\sigma \left(\log \left(L{S}_{{all}}\right)\right)$.

The suppression/enhancement of the labeling score of labeled residues for small/large values was calculated from the slope of a linear least square fit to the logarithm of the label score ${LS}$ and the label score distribution of labeled residues and all residues. The data are binned into logarithmic bins with bin intervals $\left[{1.5}^{i},{1.5}^{i+1}\right]{for\; i}\in \{-12,\ldots,11\}$ and fitted to the function

$$\log \left(\frac{P\left({LS|l}\right)}{P\left({LS}\right)}\right)=m\log \left({LS}\right)+\log (c)$$

(22)

where LS is the label score and $P\left({LS}\right)$/ ${P}\left({LS|l}\right)$ the probability distributions of the label score of all and the labeled residues. The slope $m$ is used as analysis parameter form the fitted values $m,c$.

Statistics & reproducibility

No statistical method was used to predetermine sample size. No data were excluded from the analyses. The experiments were not randomized except for new MalE mutants added during the revision of this paper. The investigators were not blinded to allocation during experiments and outcome assessment.

Protein production and labeling

In the current study, we used single cysteine variants of MalE (Fig. 4) that were obtained and fluorophore-labeled according to published procedures^60,61. PBP double cysteine variants were produced for this study. The coding sequence for the E. coli K12 phoS gene (Genbank coding sequence NC_000913.3, 3910485 - 3911525 complement, protein accession number NP_418184.1), with amino-acid changes (A17C and A197C) corresponding to the rho-PBP fluorescent biosensor variant¹⁰⁴ was synthesized (Invitrogen GeneArt Gene Synthesis, Thermo Fisher) without its N-terminal signal sequence (25 amino acid N-terminal deletion). This construct utilized flanking NdeI/XhoI sites and was subcloned into the pET20b expression vector. The resulting construct encoded a C-terminal His-tag fusion. The S3C-P86C-PBP mutant, with the additional I76G mutation that reduces the wild-type affinity (K_d 0.07 µM) of the protein for inorganic phosphate by ~ 200-fold⁸⁷ was created using a protocol based on the Stratagene Quikchange protocol. As a control, a variant was also created with only the I76G mutation.

E. coli BL21 (DE3) pLysS cells transformed with the S3C-P86C-PBP mutant expression plasmid (or the plasmid for the control variant) were used to inoculate Terrific Broth (TB; Carl Roth, Karlsruhe, Germany) supplemented with 100 µg/ml carbenicillin (Carl Roth) and 0.2% glucose to an optical density at 600 nm (OD₆₀₀) of 0.1 AU at 37 °C with shaking at 200 rpm. At an OD₆₀₀ of ~ 0.3 AU, isopropyl b-D-1-thiogalactopyranoside (IPTG, Carl Roth) was added to a final concentration of 0.5 mM, followed by ~24 h incubation. Cells were harvested by centrifugation (5000 × g, 20 min, 4 °C) at a final culture OD₆₀₀ of 3-4 AU, resuspended in 35 ml 20 mM HEPES pH 7.5, 300 mM NaCl, 10% glycerol containing protease inhibitor (cOmplete, EDTA-free Protease Inhibitor Tablets, Sigma; 1 tablet/50 ml solution), and frozen and stored at − 80 °C.

The resulting cell suspension was thawed, supplemented with 5 mM β-mercaptoethanol (β-ME) and 10 mM imidazole (Carl Roth), and then sonicated (Branson Digital Sonifier 450, Danbury, CT, USA) on ice for 10 min (Amplitude, 25%; 0.5 sec on and 0.5 sec off). Insoluble fractions containing cell debris were separated by centrifugation (165,000 × g for 1 h at 4 °C). The soluble fraction was incubated with 1.5 ml of Ni Sepharose 6 Fast Flow resin (GE Healthcare) for 1 h at 4 °C. The resin with bound protein was then washed with 80 ml of buffer containing 25 mM imidazole. Bound protein was eluted in 10 ml buffer with 500 mM imidazole. The elution fraction was concentrated to < 0.5 ml using a Viva Spin 20 concentrator with a 10 kDa MWCO (Th. Geyer, Renningen, Germany), and subjected to further purification by size-exclusion chromatography (SEC; using ÄKTA pure system, and Superdex 75 Increase 10/300 GL column (GE Healthcare)) in 20 mM Tris-HCl pH 8.0, 100 mM NaCl, 10 mM imidazole. The final purified proteins were >95% pure as assessed by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE).

His-tagged MalE and S3C-P86C-PBP proteins were labeled according to published procedures^60,61. The proteins were incubated with 1 mM DTT to reduce cysteine residues. Following dilution to lower the DTT concentration to <0.05 mM (so as not to interfere with the binding of protein to the metal-affinity resin), the proteins were immobilized on 200 µl of Ni Sepharose resin. The resin was then washed with 12 ml of 50 mM Tris-HCl pH 7.4-8.0, 50 mM KCl, 5% glycerol for MalE and SBD2 (Buffer A), and 20 mM Tris-HCl pH 8.0, 100 mM NaCl, 10 mM imidazole for PBP. 28 nmoles of PBP were then incubated overnight with 50 nmol of each fluorophore dissolved in 2 ml of the appropriate buffer. An unreacted fluorophore for MalE and SBD2 was removed by washing the resin with 12 ml of Buffer A followed by 12 ml of Buffer A containing 50% glycerol. For PBP, a single 12 ml wash was performed. Bound MalE and SBD2 were eluted with 0.5 ml of Buffer A containing 500 mM imidazole, whereas PBP was eluted with 1 ml of buffer with 500 mM imidazole. The labeled proteins were further purified by size-exclusion chromatography (using ÄKTA pure system, and Superdex 75 Increase 10/300 GL column (GE Healthcare)). The absorbance of protein (280 nm) and fluorophore (532 and 640 nm) was used for the determination of molar concentrations in samples and labeling efficiency, i.e., [Fluorophore]/[protein]*100.

Affinity measurements: Isothermal titration calorimetry and MST

Binding affinities of I76G-PBP and unlabeled S3C-P86C-PBP for inorganic phosphate were determined with a MicroCal PEAQ-ITC microcalorimeter (Malvern Panalytical) at 25 °C. Protein from a diluted solution was concentrated to ~ 30 µM using a Viva Spin 6 concentrator with a 10 kDa MWCO. The filtrate was used to prepare the phosphate solution at 450 µM. The reaction cell was filled with the protein solution and titrated in 19 steps of 2 µl each of phosphate solution in 160 s intervals. A baseline control was obtained from measurements made with protein-free filtrate in the reaction cell, and this baseline was subtracted from the experimental thermograms. Data were fitted to a single binding site model using the Setup MicroCal PEAQ-ITC Analysis Software provided by the manufacturer.

smFRET spectroscopy and data analysis

smFRET experiments of PBP and MalE were carried out on a home-built ALEX setup^60,88: PBP was studied by diluting the labeled protein to concentrations of ≈ 80 pM in a 100 µl drop of buffer (20 mM Tris-HCl pH 8.0, 100 mM NaCl, 10 mM imidazole) on a coverslip supplemented with the ligand phosphate as described in the text and figures. Before each experiment, the coverslip was passivated for 3 minutes with a 1 mg/ml BSA solution in the buffer. The measurements were performed without a photostabilizer. The fluorescent donor molecules were excited by a diode laser at 532 nm operated at 60 µW at the sample in alternation mode (50 µs alternating excitation and a 100 µs alternation period). The fluorescent acceptor molecules were excited by a diode laser at 640 nm operated at 25 µW at the sample. Data analysis was performed using a home-written software package as described in reference⁶⁰. Single-molecule events were identified using an all-photon-burst-search algorithm with a threshold of 15, a time window of 500 µs, and a minimum total photon number of 150¹⁰⁵. E-histograms of double-labeled FRET species with LD555 and LD655 were extracted by selecting 0.25 < S < 0.75. E-histograms of the open state without ligand (apo) and closed state with saturation of the ligand (holo) were fitted with a Gaussian distribution $A\,{e}^{-\frac{{\left(E-\mu \right)}^{2}}{2{\sigma }^{2}}}$.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Primary research data generated in this study have been deposited in zenodo under accession code https://zenodo.org/records/14965046.

Code availability

The web server with an intuitive user interface and default analysis settings is available under https://labelizer.bio.lmu.de/. The software is available as a Python package “labelizer” as source code under https://github.com/ChristianGebhardt/labelizer. The databases and additional information can be accessed from https://github.com/ChristianGebhardt/labelizer-supplement or from the online version of the paper.

References

Weiss, S. Fluorescence spectroscopy of single biomolecules. Science 283, 1676–1683 (1999).
Article ADS CAS PubMed Google Scholar
Algar, W. R., Hildebrandt, N., Vogel, S. S. & Medintz, I. L. FRET as a biomolecular research tool - understanding its potential while avoiding pitfalls. Nat. Methods 16, 815–829 (2019).
Article CAS PubMed Google Scholar
Joo, C., Balci, H., Ishitsuka, Y., Buranachai, C. & Ha, T. Advances in single-molecule fluorescence methods for molecular biology. Annu. Rev. Biochem. 77, 51–76 (2008).
Article CAS PubMed Google Scholar
M. Sauer, J. Hofkens & J. Enderlein. in Handbook of Fluorescence Spectroscopy and Imaging. (Wiley Online Library, 2011).
P. R. Selvin & T. Ha. Single Molecule Techniques: A Laboratory Manual. (Cold Spring Harbor Laboratory Press, 2008).
Gonçalves, M. S. T. Fluorescent labeling of biomolecules with organic probes. Chem. Rev. 109, 190–212 (2009).
Article PubMed Google Scholar
Rodriguez, E. A. et al. The growing and glowing toolbox of fluorescent and photoactive proteins. Trends Biochem. Sci. 42, 111–129 (2017).
Article CAS PubMed Google Scholar
Johnsson, N. & Johnsson, K. Chemical tools for biomolecular imaging. ACS Chem. Biol. 2, 31–38 (2007).
Article CAS PubMed Google Scholar
Xue, L., Karpenko, I. A., Hiblot, J. & Johnsson, K. Imaging and manipulating proteins in live cells through covalent labeling. Nat. Chem. Biol. 11, 917–923 (2015).
Article CAS PubMed Google Scholar
Klymchenko, A. S. Solvatochromic and fluorogenic dyes as environment-sensitive probes: Design and biological applications. Acc. Chem. Res. 50, 366–375 (2017).
Article CAS PubMed Google Scholar
Reisch, A. & Klymchenko, A. S. Fluorescent polymer nanoparticles based on dyes: Seeking brighter tools for bioimaging. Small 12, 1968–1992 (2016).
Article CAS PubMed PubMed Central Google Scholar
Michalet, X. et al. Quantum dots for live cells, in vivo imaging, and diagnostics. Science 307, 538–544 (2005).
Article ADS CAS PubMed PubMed Central Google Scholar
Goldman, R. D. & Spector, D. L. Basic Methods in Microscopy: Protocols and Concepts from Cells: A Laboratory Manual. (Cold Spring Harbor, N.Y., 2005).
Specht, E. A., Braselmann, E. & Palmer, A. E. A critical and comparative review of fluorescent tools for live-cell imaging. Annu. Rev. Physiol. 79, 93–117 (2017).
Article CAS PubMed Google Scholar
Nienhaus, K. & Ulrich Nienhaus, G. Fluorescent proteins for live-cell imaging with super-resolution. Chem. Soc. Rev. 43, 1088–1106 (2014).
Article CAS PubMed Google Scholar
Lakowicz, J. R. in Principles of Fluorescence Spectroscopy. Springer US, 2006. 205–235.
Yokota, H. Fluorescence microscopy for visualizing single-molecule protein dynamics. Biochim. Biophys. Acta - Gen. Subj. 1864, 129362 (2020).
Article CAS PubMed Google Scholar
Farka, Z. et al. Advances in optical single-molecule detection: En route to supersensitive bioaffinity assays. Angew. Chem. Int. Ed. 59, 10746–10773 (2020).
Article CAS Google Scholar
Möckl, L., Lamb, D. C. & Bräuchle, C. Super-resolved fluorescence microscopy: Nobel Prize in Chemistry 2014 for Eric Betzig, Stefan Hell, and William E. Moerner. Angew. Chem. Int. Ed. Engl. 53, 13972–13977 (2014).
Article PubMed Google Scholar
Hell, S. W. et al. The 2015 super-resolution microscopy roadmap. J. Phys. D. Appl. Phys. 48, 443001 (2015).
Article Google Scholar
Ha, T. & Tinnefeld, P. Photophysics of fluorescent probes for single-molecule biophysics and super-resolution imaging. Annu. Rev. Phys. Chem. 63, 595–617 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Toseland, C. P. Fluorescent labeling and modification of proteins. J. Chem. Biol. 6, 85–95 (2013).
Article PubMed PubMed Central Google Scholar
Liu, J. & Cui, Z. Fluorescent labeling of proteins of interest in live cells: Beyond fluorescent proteins. Bioconjug. Chem. 31, 1587–1595 (2020).
Article CAS PubMed Google Scholar
Kim, Y. et al. Efficient site-specific labeling of proteins via cysteines. Bioconjug. Chem. 19, 786–791 (2008).
Article CAS PubMed PubMed Central Google Scholar
Quast, R. B., Fatemi, F., Kranendonk, M., Margeat, E. & Truan, G. Accurate determination of human CPR conformational equilibrium by smFRET using dual orthogonal noncanonical amino acid labeling. ChemBioChem 20, 659–666 (2019).
Article CAS PubMed Google Scholar
Schulz, S., Kramm, K., Werner, F. & Grohmann, D. Fluorescently labeled recombinant RNAP system to probe archaeal transcription initiation. Methods 86, 10–18 (2015).
Article CAS PubMed Google Scholar
Kim, J. et al. Simple and efficient strategy for site-specific dual labeling of proteins for single-molecule fluorescence resonance energy transfer analysis. Anal. Chem. 85, 1468–1474 (2013).
Article CAS PubMed Google Scholar
Lee, T. C. et al. Dual unnatural amino acid incorporation and click-chemistry labeling to enable single-molecule FRET studies of p97 folding. Chembiochem 17, 981–984 (2016).
Article CAS PubMed PubMed Central Google Scholar
Laxman, P., Ansari, S., Gaus, K. & Goyette, J. The benefits of unnatural amino acid incorporation as protein labels for single molecule localization microscopy. Front. Chem. 9, 641355 (2021).
Article CAS PubMed PubMed Central Google Scholar
Tian, H., Sakmar, T. P. & Huber, T. Site-specific labeling of genetically encoded azido groups for multicolor, single-molecule fluorescence imaging of GPCRs. Methods Cell Biol. 117, 267–303 (2013).
Article CAS PubMed Google Scholar
Gust, A. et al. A starting point for fluorescence-based single-molecule measurements in biomolecular research. Molecules 19, 15824–15865 (2014).
Article PubMed PubMed Central Google Scholar
Roy, R., Hohng, S. & Ha, T. A practical guide to single-molecule FRET. Nat. Methods 5, 507–516 (2008).
Article CAS PubMed PubMed Central Google Scholar
Jeschke, G. The contribution of modern EPR to structural biology. Emerg. Top. Life Sci. 2, 9–18 (2018).
Article CAS PubMed PubMed Central Google Scholar
Milov, A. D., Maryasov, A. G. & Tsvetkov, Y. D. Pulsed electron double resonance (PELDOR) and its applications in free-radicals research. Appl. Magn. Reson. 15, 107–143 (1998).
Article CAS Google Scholar
Fleissner, M. R. et al. Proc. Natl. Acad. Sci. USA 106, 21637–21642 (2009). LP – 21642.
Article ADS CAS PubMed PubMed Central Google Scholar
Schmidt, M. J., Borbas, J., Drescher, M. & Summerer, D. A genetically encoded spin label for electron paramagnetic resonance distance measurements. J. Am. Chem. Soc. 136, 1238–1241 (2014).
Article CAS PubMed Google Scholar
Grohmann, D. et al. RNA-binding to archaeal RNA polymerase subunits F/E: a DEER and FRET study. J. Am. Chem. Soc. 132, 5954–5955 (2010).
Article CAS PubMed Google Scholar
Sanabria, H. et al. Resolving dynamics and function of transient states in single enzyme molecules. Nat. Commun. 11, 1231 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Peter, M. F. et al. Cross-validation of distance measurements in proteins by PELDOR/DEER and single-molecule FRET. Nat. Commun. 13, 4396 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Feng, Y. et al. Conformational dynamics of apo-GlnBP revealed by experimental and computational analysis. Angew. Chem. Int. Ed. 55, 13990–13994 (2016).
Article ADS CAS Google Scholar
Husada, F. et al. Conformational dynamics of the ABC transporter McjD seen by single-molecule FRET. EMBO J. 37, e100056 (2018).
Article PubMed PubMed Central Google Scholar
Sheppard, C. et al. Repression of RNA polymerase by the archaeo-viral regulator ORF145/RIP. Nat. Commun. 7, 1–13 (2016).
Sadler, E. E., Kapanidis, A. N. & Tucker, S. J. Solution-based single-molecule FRET studies of K(+) channel gating in a lipid bilayer. Biophys. J. 110, 2663–2670 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Craggs, T. D. et al. Substrate conformational dynamics facilitate structure-specific recognition of gapped DNA by DNA polymerase. Nucleic Acids Res. 47, 10788–10800 (2019).
Article CAS PubMed PubMed Central Google Scholar
Fitzgerald, G. A. et al. Quantifying secondary transport at single-molecule resolution. Nature 575, 528–534 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Mapa, K. et al. The conformational dynamics of the mitochondrial Hsp70 chaperone. Mol. Cell 38, 89–100 (2010).
Article CAS PubMed Google Scholar
Vishwakarma, R. K. et al. Single-molecule analysis reveals the mechanism of transcription activation in M. tuberculosis. Sci. Adv. 4, 1–8 (2018).
McCann, J. J., Zheng, L., Chiantia, S. & Bowen, M. E. Domain orientation in the N-Terminal PDZ tandem from PSD-95 is maintained in the full-length protein. Structure 19, 810–820 (2011).
Article CAS PubMed PubMed Central Google Scholar
Hellenkamp, B., Wortmann, P., Kandzia, F., Zacharias, M. & Hugel, T. Multidomain structure and correlated dynamics determined by self-consistent FRET networks. Nat. Methods 14, 174–180 (2017).
Article CAS PubMed Google Scholar
Sánchez-Rico, C., Voith von Voithenberg, L., Warner, L., Lamb, D. C. & Sattler, M. Effects of fluorophore attachment on protein conformation and dynamics studied by spFRET and NMR spectroscopy. Chemistry 23, 14267–14277 (2017).
Article PubMed PubMed Central Google Scholar
Dimura, M. et al. Quantitative FRET studies and integrative modeling unravel the structure and dynamics of biomolecular systems. Curr. Opin. Struct. Biol. 40, 163–185 (2016).
Article CAS PubMed Google Scholar
Yanez Orozco, I. S. et al. Identifying weak interdomain interactions that stabilize the supertertiary structure of the N-terminal tandem PDZ domains of PSD-95. Nat. Commun. 9, 3724 (2018).
Article ADS PubMed PubMed Central Google Scholar
Dimura, M. et al. Automated and optimally FRET-assisted structural modeling. Nat. Commun. 11, 5394 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Krishnamohan, A., Hamilton, G. L., Goutam, R., Sanabria, H. & Morcos, F. Coevolution and smFRET enhances conformation sampling and FRET experimental design in tandem PDZ1-2 proteins. J. Phys. Chem. B 127, 884–898 (2023).
Article CAS PubMed PubMed Central Google Scholar
Sikkema, H. R. & Poolman, B. In silico method for selecting residue pairs for single-molecule microscopy and spectroscopy. Sci. Rep. 11, 5756 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Bavishi, K. et al. Direct observation of multiple conformational states in Cytochrome P450 oxidoreductase and their modulation by membrane environment and ionic strength. Sci. Rep. 8, 1–9 (2018).
Article CAS Google Scholar
Mitchell. T. M. in Machine Learning (1997).
Lewis. D. D. in Naive (Bayes) at forty: The independence assumption in information retrieval. (eds. Nédellec, C. & Rouveirol, C.) (Springer Berlin Heidelberg, 1998).
Berntsson, R. P.-A., Smits, S. H. J., Schmitt, L., Slotboom, D.-J. & Poolman, B. A structural classification of substrate-binding proteins. FEBS Lett. 584, 2606–2617 (2010).
Article CAS PubMed Google Scholar
Gouridis, G. et al. Conformational dynamics in substrate-binding domains influences transport in the ABC importer GlnPQ. Nat. Struct. Mol. Biol. 22, 57–64 (2015).
Article CAS PubMed Google Scholar
de Boer, M. et al. Conformational and dynamic plasticity in substrate-binding proteins underlies selective transport in ABC importers. Elife 8, e44652 (2019).
Glaser, F. et al. ConSurf: identification of functional regions in proteins by surface-mapping of phylogenetic information. Bioinformatics 19, 163–164 (2003).
Article CAS PubMed Google Scholar
Ashkenazy, H. et al. ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res. 44, W344–W350 (2016).
Article CAS PubMed PubMed Central Google Scholar
Kabsch, W. & Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637 (1983).
Article CAS PubMed Google Scholar
Hamelryck, T. An amino acid has two sides: a new 2D measure provides a different view of solvent exposure. Proteins Struct. Funct. Genet. 59, 38–48 (2005).
Article CAS PubMed Google Scholar
Sanner, M. F., Olson, A. J. & Spehner, J.-C. Reduced surface: an efficient way to compute molecular surfaces. Biopolymers 38, 305–320 (1996).
Article CAS PubMed Google Scholar
Hobohm, U., Scharf, M., Schneider, R. & Sander, C. Selection of representative protein data sets. Protein Sci. 1, 409–417 (1992).
Article CAS PubMed PubMed Central Google Scholar
Griep, S. & Hobohm, U. PDBselect 1992–2009 and PDBfilter-select Nucleic Acids Res. 38, D318–D319 (2010).
Hand, D. J. & Yu, K. Idiot's Bayes—not so stupid after all? Int. Stat. Rev. 69, 385–398 (2001).
Rish, I. in IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence (2001).
Domingos, P. On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29, 103–130 (1997).
Niculescu-mizil, A. & Caruana, R. Predicting good probabilities with supervised learning. In Proceedings of the 22nd International Conference on Machine Learning. (2005).
Gouridis, G. et al. Structural dynamics in the evolution of a bilobed protein scaffold. Proc. Natl. Acad. Sci. USA 118, e2026165118 (2021).
Article CAS PubMed PubMed Central Google Scholar
Mächtel, R., Narducci, A., Griffith, D. A., Cordes, T. & Orelle, C. An integrated transport mechanism of the maltose ABC importer. Res. Microbiol. 170, 321–337 (2019).
Article PubMed PubMed Central Google Scholar
Sindbert, S. et al. Accurate distance determination of nucleic acids via Förster resonance energy transfer: implications of dye linker length and rigidity. J. Am. Chem. Soc. 133, 2463–2480 (2011).
Article CAS PubMed Google Scholar
Hellenkamp, B. et al. Precision and accuracy of single-molecule FRET measurements-a multi-laboratory benchmark study. Nat. Methods 15, 669–676 (2018).
Article CAS PubMed PubMed Central Google Scholar
Best, R. B., Hofmann, H., Nettels, D. & Schuler, B. Quantitative interpretation of FRET experiments via molecular simulation: force field and validation. Biophys. J. 108, 2721–2731 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Hoefling, M. & Grubmüller, H. In silico FRET from simulated dye dynamics. Comput. Phys. Commun. 184, 841–852 (2013).
Article ADS CAS Google Scholar
Reinartz, I. et al. Simulation of FRET dyes allows quantitative comparison against experimental data. J. Chem. Phys. 148, 123321 (2018).
Article ADS PubMed Google Scholar
Agam, G. et al. Reliability and accuracy of single-molecule FRET studies for characterization of structural dynamics and distances in proteins. Nat. Methods 20, 523–535 (2023).
Article CAS PubMed PubMed Central Google Scholar
Höfig, H., Gabba, M., Poblete, S., Kempe, D. & Fitter, J. Inter-dye distance distributions studied by a combination of single-molecule FRET-filtered lifetime measurements and a weighted accessible volume (wAV) algorithm. Molecules 19, 19269–19291 (2014).
Article PubMed PubMed Central Google Scholar
Cerminara, M., Schöne, A., Ritter, I., Gabba, M. & Fitter, J. Mapping multiple distances in a multidomain protein for the identification of folding intermediates. Biophys. J. 118, 688–697 (2020).
Article ADS CAS PubMed Google Scholar
Schmidt, A., Altincekic, N., Gustmann, H., Wachtveitl, J. & Hengesbach, M. The protein microenvironment governs the suitability of labeling sites for single-molecule spectroscopy of RNP complexes. ACS Chem. Biol. 13, 2472–2483 (2018).
Article CAS PubMed Google Scholar
Fijen, C. et al. Using single-molecule FRET to probe the nucleotide-dependent conformational landscape of polymerase β-DNA complexes. J. Biol. Chem. 295, 9012–9020 (2020).
Article CAS PubMed PubMed Central Google Scholar
Dörfler, T., Eilert, T., Röcker, C., Nagy, J. & Michaelis. J. Structural information from single-molecule FRET experiments using the fast nano-positioning system. J. Vis. Exp. 120, 54782 (2017).
Kalinin, S. et al. A toolkit and benchmark study for FRET-restrained high-precision structural modeling. Nat. Methods 9, 1218–1225 (2012).
Article CAS PubMed Google Scholar
Solscheid, C. et al. Development of a reagentless biosensor for inorganic phosphate, applicable over a wide concentration range. Biochemistry 54, 5054–5062 (2015).
Article CAS PubMed Google Scholar
Gebhardt, C. et al. Molecular and spectroscopic characterization of green and red cyanine fluorophores from the alexa fluor and AF series*. ChemPhysChem 22, 1566–1583 (2021).
Article CAS PubMed PubMed Central Google Scholar
de Boer, M., Gouridis, G., Muthahari, Y. A. & Cordes, T. Single-molecule observation of ligand binding and conformational changes in FeuA. Biophys. J. 117, 1642–1654 (2019).
Article PubMed PubMed Central Google Scholar
Ahmed, A., Rippmann, F., Barnickel, G. & Gohlke, H. A normal mode-based geometric simulation approach for exploring biologically relevant conformational transitions in proteins. J. Chem. Inf. Model. 51, 1604–1622 (2011).
Article CAS PubMed Google Scholar
Krüger, D. M., Ahmed, A. & Gohlke, H. NMSim Web Server: integrated approach for normal mode-based geometric simulations of biologically relevant conformational transitions in proteins Nucleic Acids Res. 40, W310–W316 (2012).
Worth, C. L., Preissner, R. & Blundell, T. L. SDM—a server for predicting effects of mutations on protein stability and malfunction Nucleic Acids Res. 39, W215–W222 (2011).
Pandurangan, A. P., Ochoa-Montaño, B., Ascher, D. B. & Blundell, T. L. W. SDM: a server for predicting effects of mutations on protein stability. Nucleic Acids Res. 45, W229–W235 (2017).
Graen, T., Hoefling, M. & Grubmüller, H. AMBER-DYES: Characterization of charge fluctuations and force field parameterization of fluorescent dyes for molecular dynamics simulations. J. Chem. Theory Comput. 10, 5505–5512 (2014).
Article CAS PubMed Google Scholar
Cordes, T. et al. Sensing DNA opening in transcription using quenchable Förster resonance energy transfer. Biochemistry 49, 9171–9180 (2010).
Article CAS PubMed Google Scholar
Zhou, R., Kunzelmann, S., Webb, M. R. & Ha, T. Detecting intramolecular conformational dynamics of single molecules in short distance range with subnanometer sensitivity. Nano Lett. 11, 5482–5488 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Hwang, H. & Myong, S. Protein induced fluorescence enhancement (PIFE) for probing protein-nucleic acid interactions. Chem. Soc. Rev. 43, 1221–1229 (2014).
Article CAS PubMed PubMed Central Google Scholar
Ploetz, E. et al. Förster resonance energy transfer and protein-induced fluorescence enhancement as synergetic multi-scale molecular rulers. Sci. Rep. 6, 33257 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Hohlbein, J., Gryte, K., Heilemann, M. & Kapanidis, A. N. Surfing on a new wave of single-molecule fluorescence methods. Phys. Biol. 7, 31001 (2010).
Article Google Scholar
Hohlbein, J., Craggs, T. D. & Cordes, T. Alternating-laser excitation: single-molecule FRET and beyond. Chem. Soc. Rev. 43, 1156–1171 (2014).
Article CAS PubMed Google Scholar
Freedman, D. A. in Statistical Models: Theory and Practice. (Cambridge University Press, 2009).
Shieh, G. A comparison of two indices for the intraclass correlation coefficient. Behav. Res. Methods 44, 1212–1223 (2012).
Article ADS PubMed Google Scholar
Cohen, J. in Statistical Power Analysis for the Behavioral Sciences (Academic Press, 1977).
Okoh, M. P., Hunter, J. L., Corrie, J. E. T. & Webb, M. R. A biosensor for inorganic phosphate using a rhodamine-labeled phosphate binding protein. Biochemistry 45, 14764–14771 (2006).
Article CAS PubMed Google Scholar
Eggeling, C., Fries, J. R., Brand, L., Günther. R. & Seidel. C. A. M. Monitoring conformational dynamics of a single molecule by selective fluorescence spectroscopy. Proc. Natl. Acad. Sci. 95, 1556–1561 (1998).
Yao, N., Ledvina, P. S., Choudhary, A. & Quiocho, F. A. Modulation of a salt link does not affect binding of phosphate to its specific active transport receptor. Biochemistry 35, 2079–2085 (1996).
Article CAS PubMed Google Scholar
Yang, Y., Ballent. W. & Mayer. B. K. High-affinity phosphate-binding protein (PBP) for phosphorous recovery: proof of concept using recombinant Escherichia coli. FEMS Microbiol. Lett. 363, https://doi.org/10.1093/femsle/fnw240 (2016).
Yamashita, A., Singh, S. K., Kawate, T., Jin, Y. & Gouaux, E. Crystal structure of a bacterial homologue of Na+/Cl-dependent neurotransmitter transporters. Nature 437, 215–223 (2005).
Article ADS CAS PubMed Google Scholar
Zhao, Y. et al. Single-molecule dynamics of gating in a neurotransmitter transporter homologue. Nature 465, 188–193 (2010).
Article ADS CAS PubMed PubMed Central Google Scholar
Johnson, S. J., Taylor, J. S. & Beese, L. S. Processive DNA synthesis observed in a polymerase crystal suggests a mechanism for the prevention of frameshift mutations. Proc. Natl. Acad. Sci. USA 100, 3895–3900 (2003).
Article ADS CAS PubMed PubMed Central Google Scholar
Hohlbein, J. et al. Conformational landscapes of DNA polymerase I and mutator derivatives establish fidelity checkpoints for nucleotide insertion. Nat. Commun. 4, 2131 (2013).
Article ADS PubMed Google Scholar

Download references

Acknowledgements

This work was financed by an ERC Starting Grant (ERC-StG 638536 - SM-IMPORT to T.C.) and an ERC Consolidator Grant (ProForce to J.L.), Deutsche Forschungsgemeinschaft (GRK2062 project C03 to T.C., SFB863 projects A11 and A13 to J.L. and T.C.; Sachbeihilfe CO879/4-1 to T.C.), BMBF (KMU innovative “quantum FRET” to T.C.) LMU excellent, the Center for Integrated Protein Science Munich (CiPSM), and the Center for Nanoscience (CeNS). We thank all members of the Cordes lab for actively testing the labelizer procedure and web server, in particular Rebecca Mächtel, Alessandra Narducci, Oliver Brix, Leonor Correia, Shirsha Roy, and Chuyu Han. We finally thank our colleagues Gregor Hagelücken, Eitan Lerner, Nicole Robb and Giorgos Gouridis for discussions and support of the project.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Physical and Synthetic Biology, Faculty of Biology, Ludwig-Maximilians-Universität München, Großhadernerstr. 2-4, Planegg-Martinsried, Germany
Christian Gebhardt, Pascal Bawidamann, Anna-Katharina Spring, Konstantin Schütze, Gabriel G. Moya Muñoz, Nicolas D. Wendler, Douglas A. Griffith & Thorben Cordes
Biophysical Chemistry, Department of Chemistry and Chemical Biology, Technische Universität Dortmund, Dortmund, Germany
Anna-Katharina Spring, Gabriel G. Moya Muñoz, Nicolas D. Wendler & Thorben Cordes
Klinikum rechts der Isar, Technische Universität München, Klinik und Poliklinik für Innere Medizin II, München, Germany
Robin Schenk
Department of Physics and Center for NanoScience, Ludwig-Maximilians-Universität München, Amalienstr. 54, München, Germany
Jan Lipfert
Soft Condensed Matter and Biophysics, Department of Physics and Debye Institute for Nanomaterials Science, Utrecht University, Princetonplein 1, Utrecht, The Netherlands
Jan Lipfert

Authors

Christian Gebhardt
View author publications
Search author on:PubMed Google Scholar
Pascal Bawidamann
View author publications
Search author on:PubMed Google Scholar
Anna-Katharina Spring
View author publications
Search author on:PubMed Google Scholar
Robin Schenk
View author publications
Search author on:PubMed Google Scholar
Konstantin Schütze
View author publications
Search author on:PubMed Google Scholar
Gabriel G. Moya Muñoz
View author publications
Search author on:PubMed Google Scholar
Nicolas D. Wendler
View author publications
Search author on:PubMed Google Scholar
Douglas A. Griffith
View author publications
Search author on:PubMed Google Scholar
Jan Lipfert
View author publications
Search author on:PubMed Google Scholar
Thorben Cordes
View author publications
Search author on:PubMed Google Scholar

Contributions

C.G. and T.C. conceived and designed the study. C.G. performed research, data analysis, and software implementation. J.L. provided analytical tools. K.S. and T.C. analyzed data. C.G., P.B., R.S., and K.S. implemented the webserver. A.K.S., N.W., and G.G.M.M. performed research. D.A.G. prepared PBP variants, performed research, and analyzed data. J.L. and T.C. supervised the study and acquired funding. C.G., J.L., and T.C. discussed and interpreted the results and wrote the manuscript in consultation with all authors.

Corresponding authors

Correspondence to Christian Gebhardt, Jan Lipfert or Thorben Cordes.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Christina Bergonzo and the other anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Reporting Summary

Supplementary Data 1

Description of Additional Supplementary Files

Transparent Peer Review file

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Gebhardt, C., Bawidamann, P., Spring, AK. et al. Labelizer: systematic selection of protein residues for covalent fluorophore labeling. Nat Commun 16, 4147 (2025). https://doi.org/10.1038/s41467-025-58602-y

Download citation

Received: 28 June 2023
Accepted: 27 March 2025
Published: 04 May 2025
Version of record: 04 May 2025
DOI: https://doi.org/10.1038/s41467-025-58602-y

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Database of successfully labeled residues

Bayesian approach to the prediction of labeling sites

The combined label score predicts potential labeling sites

Experimental benchmarking of the label score

Extension of the LS score to FRET experiments

Accessible volume calculations for FRET labels

Experimental benchmarking of the FRET score

Discussion

Methods

Database generation

Parameter frequency calculation

Parameter information analysis

Parameter cross-correlation

Parameter selection criteria

Statistics & reproducibility

Protein production and labeling

Affinity measurements: Isothermal titration calorimetry and MST

smFRET spectroscopy and data analysis

Reporting summary

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links