Label-free estimation of regulatory T cell activation markers using Raman spectroscopy with machine learning

Azari-Pour, Aria; Chamkalani, Ali; Rangan, Shreyas; MacDonald, Katherine N.; Huynh, Miles; Levings, Megan K.; Schulze, H. Georg; Piret, James M.; Gopaluni, Bhushan

doi:10.1038/s41598-025-16002-8

Download PDF

Article
Open access
Published: 04 November 2025

Label-free estimation of regulatory T cell activation markers using Raman spectroscopy with machine learning

Scientific Reports volume 15, Article number: 38519 (2025) Cite this article

329 Accesses
Metrics details

Abstract

Regulatory T cells are a class of T lymphocytes which respond to activation signals by expanding their cell numbers, and whose culturing and expansion are of significant clinical interest. Cellular activation states are used to inform process control decisions such as restimulation and can be probed with experimental measurements of cell surface markers. However, these measurements are expensive, time-consuming, and invasive, and an urgent need exists for devising a non-invasive method for activation state monitoring that could be deployed on-line. Raman spectroscopy is a label-free and information-rich optical method that, when coupled to data analytical methods, can ameliorate these experimental issues. In this work, we quantitatively estimated experimental measurements of regulatory T cell activation markers with high accuracy. We simulated a clinical manufacturing setting by building an ${L}_{1}$-regularized least-squares model with spectroscopic data from six regulatory T cell donors. Then, we validated the constructed model by accurately estimating different experimental measurements of biomarker values from two external donors, unseen by the model. We have devised a robust program to effectively estimate the activation state of regulatory T cells. We anticipate our method to be used with on-line Raman probes integrated into cell manufacturing devices for label-free monitoring of these processes.

Introduction

Immune cell therapies have emerged as a viable candidate for the treatment of human diseases^1,2. The culturing and expansion of cells for immune cellular therapies is a critical process control parameter which impacts the efficacy of these therapies^3,4,5,6. In clinical settings during culturing and expansion, cells will typically be activated through stimulation with specific antigen-presenting cells showing a specific antigen. The subsequent timeline of maintaining these cells requires dynamic time-dependent information regarding the intracellular growth state of the expanding culture^7,8. Typically following the initial activation, at some point during the expansion the intracellular growth rate slows and the cells must be restimulated^9,10. The standard method to understand the intracellular dynamics is with experimental measurements such as flow cytometry, which are costly, time-consuming, and require advanced equipment^11,12,13. There is a pressing need to devise alternative methodologies to probe the activation states of immune cells during clinical culturing and expansion^14,15. In this work, we use simple regression methods with measured Raman spectroscopy data to quantitatively estimate experimental measurements of known regulatory T (Treg) cell activation markers.

Tregs are a subpopulation of T lymphocytes that maintain immune homeostasis and prevent autoimmunity¹⁶. Tregs comprise 5-10% of the CD4+ T cell population and can be characterized by their expression of CD4, CD25, and FOXP3 transcription factors¹⁷. These specialized immune cells modulate immune responses to suppress immune activation^18,19,20. Cell therapies with Tregs are currently undergoing clinical trials to treat autoimmune disorders, transplant rejection, and graft-versus-host disease²¹. Tregs have also been transduced to express a Chimeric Antigen Receptor (CAR) for CAR-Treg therapy²², building on the success of CAR T cell therapy cancer treatment²³.

The latency-associated peptide (LAP) is a Treg biomarker which is part of the latent form of transforming growth factor-β1 (TGF-β1) and in turn controls T cell development, homeostasis, and function²⁴. TGF-β1 is a suppressive cytokine expressed by Tregs that is secreted upon proteolytic cleavage of the latent, LAP-complexed form²⁵. Glycoprotein A repetitions predominant (GARP), another Treg biomarker, is a transmembrane protein containing leucine-rich repeats that can bind and present latent TGF-β1 on the cell surface²⁶. Both LAP and GARP are highly expressed on the surface of activated Tregs²⁶, and we aim to estimate the levels of flow cytometry measurements of these two biomarkers in this study. When expanding the number of Tregs in culture, decreases in the levels of these markers provide an indication of slowing Treg growth, such that their levels can be used to guide when to restimulate the Tregs for greater expansion in their numbers²⁷. If cells are restimulated too early, this is known to cause activation-induced cell death²⁸. Thus, it would be useful if the levels of these Treg activation biomarkers could be analyzed by an on-line method, to not depend on frequent sampling that require off-line flow cytometry analysis.

Raman spectroscopy is an experimental method that can generate a large amount of information about a biological system. Spectroscopic methods have the advantage that they can be implemented non-invasively, non-destructively, and cost-effectively compared to other means of chemometric analysis. Raman spectroscopy is based on Raman scattering, a phenomenon of light-matter interaction in which light is inelastically scattered, resulting in a different wavelength after scattering due to an exchange of energy²⁹. Raman scattering was first predicted by Smekal³⁰ and then observed experimentally by Raman and Krishnan³¹. A Raman spectrum is a plot of the scattering intensity as a function of the light frequency shift after Raman scattering, referred to as a Raman shift and measured in wavenumbers²⁹. Talari et al.³² provide a review on biological assignments of Raman shifts.

The complexity and multidimensionality of biological Raman spectra require advanced regression techniques to decipher and extract meaningful information. Data-driven machine learning (ML) modelling approaches excel at recognizing intricate patterns and relationships within high-dimensional datasets^33,34,35. ML techniques can efficiently process large amounts of data and handle non-linearities. By training ML models on large-scale Raman spectral data, these models can learn to discern subtle spectral features, capture non-linear dependencies, and uncover hidden structures that may elude traditional analytical approaches. Leveraging these methods allows for more accurate classification, prediction of unknown spectra, and identification of structural changes or interactions. The utilization of ML methods for analyzing Raman spectral data has been shown to be effective^36,37,38. In oncology, substantial research has been carried out on the ability to classify cancers and other diseases using Raman spectroscopic data³⁶.

The main question of interest in this work was formulated as follows. Is it possible to use the intensities in the Raman spectra of Tregs to estimate the levels of LAP and GARP in Tregs to permit substituting measurements of these Treg biomarkers with measurements of Raman spectra? Such a capability could complement or even circumvent laboratory methods to perform real-time analysis, enabling rapid decision-making in applications such as quality control, process optimization, and characterization. In this work, a preliminary analysis was performed using Multisource Correlation Analysis³⁹. The results of this linear method led to the investigation of the problem more formally using five linear supervised ML methods. Three methods were variations on the least-squares algorithm: Ordinary Least-Squares (OLS), Least Absolute Shrinkage and Selection Operator (Lasso), and Ridge Regression; the fourth was Partial Least-Squares (PLS) Regression, a combination of Principal Component Analysis (PCA) and OLS; and the final model was Linear Support Vector Regression (LSVR).

In the next section, we formulate the mathematical statement of the problem of interest in this paper. The input and output data for our workflow will be formally described in the context of the theory of the different least-squares models, and we refer readers to the literature on both PLS⁴⁰ and LSVR⁴¹.

Modelling and theoretical aspects

A Raman spectrum is denoted as ${\left({X}_{j}\right)}_{j=1}^{m}\in {\mathbb{R}}^{m}$, and is a vector in m-dimensional Euclidean space with components given by the experimentally measured spectral intensities $\left\{{X}_{j}:j=1,\dots ,m\right\}$ at each of the Raman shifts $j=1,\dots ,m$. There is a fixed number of $m$ Raman shifts, or features, that are measured in each Raman spectrum, which in this paper is always $m=963$. A collection of n Raman spectra, where n denotes the number of statistical samples, is denoted with boldface as $\mathbf{X}=\left({X}_{ij}\right)\in {\mathbb{R}}^{n\times m}$ and is called the regressor (or design) matrix with n rows and m columns. For each statistical sample $i=1,\dots ,n$, there is, at the same time point, a corresponding response variable, and all the response variables together represent the response vector, ${\mathbf{Y}=\left({Y}_{i}\right)}_{i=1}^{n}\in {\mathbb{R}}^{n}$. A full dataset is defined as $\left\{\mathbf{Y}, \mathbf{X}\right\}={\{{Y}_{i}, {X}_{i1}, \dots , {X}_{im}\}}_{i=1}^{n}$, i.e., the total number of input features and responses measured across all samples. Additionally, we have used eight Treg donors for data acquisition where the data obtained, i.e., the Raman spectra and biomarker response variables, were measured over six days. Therefore, the number of statistical samples will be $n\le 48$ for most purposes.

The problem of interest of this paper can now be stated mathematically. Suppose we have a full dataset $\left\{\mathbf{Y}, \mathbf{X}\right\}$, and a separate collection of k measured Raman spectra $\overline{\mathbf{X}}\in {\mathbb{R} }^{k\times m}$ for which we may have no measured response data for all or some of the samples. Then, given a set of parameters $\Theta$, we want to build a parametric function $f:{\mathbb{R}}^{k\times m}\to {\mathbb{R}}^{k}$ which accurately estimates the response variables $\overline{\mathbf{Y} }$ corresponding to $\overline{\mathbf{X} }$, or in other words, what would have been measured had we performed an experimental measurement for those samples. The mathematical statement of the problem above does not specify the functional form of $f=f(\overline{\mathbf{X} };\Theta )$, however, we assume that the function $f$ is linear in its argument $\overline{\mathbf{X} }$. Let us introduce the parameters $\Theta =\left\{{\beta }_{1},\dots ,{\beta }_{m}, {\beta }_{0}\right\}=\left\{{\varvec{\upbeta}}, {\beta }_{0}\right\}$ and refer to ${\varvec{\upbeta}}={\left({\beta }_{1},\dots ,{\beta }_{m}\right)}^{\text{T}}\in {\mathbb{R}}^{m}$ as a weight vector, with ${\beta }_{0}\in {\mathbb{R}}$ a constant. Then a linear parametric model assumes that for the ${i}^{\text{th}}$ sample of some design matrix $\mathbf{X}$ the response has the form

$${{Y}}_{\text{i}}^{{^{\prime}}}={\beta }_{1}{{X}}_{{i}1}+\dots +{\beta }_{{m}}{{X}}_{{im}}+{\beta }_{0}$$

(1)

In vector form, the estimated response vector ${\mathbf{Y}}^{\mathbf{^{\prime}}}$ can therefore be written as

$${{\mathbf{Y}}}^{{^{\prime}}}={\mathbf{X}}{\varvec{\upbeta}}+{\beta }_{0}{1}_{{\boldsymbol{k}}}$$

(2)

where ${1}_{{\boldsymbol{k}}}$ is the k-dimensional vector with all entries equal to 1. The quantity ${\beta }_{0}$ is referred to as a bias term or offset in the estimation. Eq. (2) above is computed easily using matrix multiplication assuming that the optimal weights ${\varvec{\upbeta}}$ and offset ${\beta }_{0}$ are known. The mathematical problem has now shifted to finding parameters $\Theta =\left\{{\varvec{\upbeta}}, {\beta }_{0}\right\}$ which give the most accurate estimates. The use of least-squares regression to find optimal parameters $\Theta$ can be summarized as follows: we have a dataset $\left\{\mathbf{Y},\mathbf{X}\right\}$ and we want to use this dataset to generate the parameters $\Theta$ based on the patterns in the data. Then, those parameters $\Theta$ can be used with $\mathbf{X}$ in Eq. (2) to obtain the estimate ${\mathbf{Y}}^{\mathbf{^{\prime}}}$. The process of finding optimal parameters $\Theta$ from the dataset $\left\{\mathbf{Y},\mathbf{X}\right\}$ is referred to as training the model $f$ and $\left\{\mathbf{Y},\mathbf{X}\right\}$ is referred to as the training set. The process of validating the trained model by using it to estimate responses from samples it was not trained with is referred to as testing the model $f$. In this work, we will refer to the process of exporting the trained model to estimate responses for donors that are completely novel to the model as validation or external donor validation.

Least-squares model and regularization

The least-squares algorithm finds the parameters $\Theta$ which minimize the distance between the estimated responses given by $\Theta$ and the training responses. Let

$${\Vert {\mathbf{Q}}\Vert }_{{p}}:={\left(\sum_{{i}=1}^{{n}}{\left|{{Q}}_{{i}}\right|}^{{p}}\right)}^{1/{p}}$$

(3)

be the L_p norm of a vector $\mathbf{Q}=\left({Q}_{1},\dots ,{Q}_{n}\right)\in {\mathbb{R}}^{n}$. Now, introduce the least-squares loss function as

$$\mathcal{O}\left({\varvec{\upbeta}},{\beta }_{0}\right)={\left({\Vert {\mathbf{Y}}-{{\mathbf{Y}}}^{{^{\prime}}}\Vert }_{2}\right)}^{2}=\sum_{{i}=1}^{{n}}{\left({{Y}}_{{i}}-{{Y}}_{{i}}^{{^{\prime}}}\right)}^{2}=\sum_{{i}=1}^{{n}}{\left({{Y}}_{{i}}-\sum_{{j}=1}^{{m}}{{X}}_{{ij}}{\beta }_{{j}}-{\beta }_{0}\right)}^{2}$$

(4)

The more accurate the estimate $\mathbf{Y}{^{\prime}}$, the smaller $\mathcal{O}\left({\varvec{\upbeta}},{\beta }_{0}\right)$. Therefore, the problem has now shifted to minimizing $\mathcal{O}\left({\varvec{\upbeta}},{\beta }_{0}\right)$. The loss function depends only on the parameters ${\varvec{\upbeta}}, {\beta }_{0}$, as the training dataset $\left\{\mathbf{Y},\mathbf{X}\right\}$ is fixed and $\mathbf{Y}{^{\prime}}$ depends on $\mathbf{X},{\varvec{\upbeta}},\text{ and }{\beta }_{0}.$ For the OLS model, the optimal parameters, denoted by $\widehat{\Theta }=\left\{\widehat{{\varvec{\upbeta}}}, {\widehat{\beta }}_{0}\right\},$ can be found as the minimum of $\mathcal{O}\left({\varvec{\upbeta}}, {\beta }_{0}\right)$ with respect to the arguments ${\varvec{\upbeta}}, {\beta }_{0}$, formally written as

$$\widehat{\Theta }=\left\{\widehat{{\varvec{\upbeta}}}, {\widehat{\beta }}_{0}\right\}=\underset{{\varvec{\upbeta}}, {\beta }_{0}}{{\arg}\; {\min}}\mathcal{O}\left({\varvec{\upbeta}}, {\beta }_{0}\right)$$

(5)

There is an analytical solution to Eq. (5), however, this may not be practical. In this study, we have $m\gg n$, i.e., the number of features far exceeds the number of samples. Such a phenomenon often leads to overfitting, in which the model is trained with high accuracy for the training responses but performs with low accuracy for testing responses. In such a low-n type situation, it is often beneficial to use regularization, in which one amends the loss function in order to encourage the model to tend to solutions of ${\varvec{\upbeta}}, {\beta }_{0}$ in which many components of ${\varvec{\upbeta}}$ are zero.

The first regularized model we consider is Lasso, whose loss function has the form

$$\mathcal{L}\left({\varvec{\beta}}, {\beta }_{0};\lambda \right)=\mathcal{O}\left({\varvec{\upbeta}}, {\beta }_{0}\right)+\lambda {\Vert {\varvec{\upbeta}}\Vert }_{1}$$

(6)

and the parameters $\Theta$ are found according to Eq. (5) with Eq. (6) instead as the loss function. The value $\lambda$ is referred to as a hyperparameter and is chosen, or tuned, before training. The second regularized model we consider is Ridge, which has a loss function of the form

$$\mathcal{R}\left({\varvec{\upbeta}}, {\beta }_{0};\delta \right)=\mathcal{O}\left({\varvec{\upbeta}}, {\beta }_{0}\right)+\delta {\left({\Vert {\varvec{\upbeta}}\Vert }_{2}\right)}^{2}$$

(7)

and again, the parameters are found according to Eq. (5) with Eq. (7) instead as the loss function, and $\delta$ is a hyperparameter.

In this work, the density of a vector refers to the number of non-zero components and the sparsity of a vector refers to the number of components that are equal to zero, such that the sparsest vectors are $0$-dense. Therefore, a vector is $k$-dense if it contains exactly $k$ non-zero components and all other components are zero. We refer to feature $i$ being selected after training a model if the ${i}^{\text{th}}$ component of the weight vector is non-zero, ${\beta }_{i}\ne 0$. If $k$ features are selected, then the resulting vector ${\varvec{\upbeta}}$ will be $k$-dense and $\left(\text{dim}{\varvec{\upbeta}}-k\right)$-sparse. Lasso is well-known to enhance sparsity in vector-based feature selection within high-dimensional feature spaces⁴².

In order to quantify the contribution of a single feature to the estimation of a biomarker, we define the Biomarker Estimation Contribution (BEC) of feature $j=1,\dots ,m$ for sample $i=1,\dots ,n$ as the product ${\beta }_{j}{X}_{ij}$. Summing the BECs over all the features of the ${i}^{\text{th}}$ sample gives the estimate of the ${i}^{\text{th}}$ response minus the bias, ${Y}'_{i}-{\beta }_{0}=\sum_{j=1}^{m}{\beta }_{j}{X}_{ij}$. The weight ${\beta }_{j}$ for feature $j$ may be very high, but if the Raman spectral intensity, ${X}_{ij}$, of that feature for sample $i$ is very low, then the contribution to the biomarker estimation of feature $j$ for that sample is low. For a set of $k$ samples indexed by $\{1, \dots , k\}$, we define the mean BEC and standard deviation (SD) BEC of feature $j$ as

$$\text{Mean BEC}={\beta }_{j}{\langle {X}_{ij}\rangle }_{i}\equiv \frac{{\beta }_{j}}{k}\sum_{i=1}^{k}{X}_{ij}$$

(8)

$${\left(\text{SD BEC}\right)}^{2}\equiv \frac{1}{k-1}\sum_{i=1}^{k}{\left({{\beta }_{j}X}_{ij}-{{\beta }_{j}\langle {X}_{nj}\rangle }_{n}\right)}^{2}$$

(9)

where ${\langle {X}_{nj}\rangle }_{n}$ denotes the arithmetic mean over the index $n$.

Results

Treg expression and spectroscopic data

Tregs respond to activation signals by expanding their cell numbers for 7 to 14 days before return to a resting state. To obtain greater expansion, they must be restimulated or activated again. Figure 1A–D shows flow cytometric measurements of LAP and GARP with two different metrics, percent positive (%⁺) and mean fluorescence intensity (MFI), over the course of expansion for eight Treg donors. The values of LAP and GARP both increased in terms of %⁺ and MFI following activation on day 0, and these both decreased between days 3 and 11, then both increased again after restimulation on day 11. The relatively low expression of LAP and GARP at day 11 for most donors suggests that the cells had returned to a resting state and were ready to be restimulated. The fold expansion (Figure 1E) increased exponentially over time for all donors, and the mean cell diameter of the culture (Figure 1F) increased on days immediately following activation or restimulation. Figure 2 shows Raman spectra plotted for donor 5 on days 5, 7, 9, 11, 13, and 15 of expansion and depicts the characteristic patterns in the Raman spectral features over the culturing timeline. Different peaks in the Raman spectrum show dynamical values of the Raman spectral intensities over the culturing timeline, which occurs as a result of fluctuating amounts of Treg biomolecules during cell culturing. It is precisely this correlation between Raman shifts and intracellular biomolecules that is leveraged in this study to estimate levels of LAP and GARP in Tregs over the culturing timeline.

Multisource correlation analysis shows correlation of biomarkers and spectra

The biomarker estimates of the preliminary Multisource Correlation Analysis (MuSCA) method³⁹ are shown in Figure 3. The MuSCA methods are described in Section 1 of the Supplementary Information. Two of the biomarkers, %GARP⁺ (Figure 3B) and MFI GARP (Figure 3D), showed estimates with ${R}^{2}$ of around 0.90. The other two biomarkers in Figure 3 had estimates with relatively lower ${R}^{2}$ and higher RMSE. However, it is important to note for these results that the linear model used in the MuSCA method was trained using the same biomarker values shown in the plots in Figure 3, and therefore the corresponding estimates should be regarded as training estimates. There were limited testing estimates used in the MuSCA method, which is an impediment for our ability to analyze the robustness of the model. Nonetheless, these results showed that certain wavenumbers in the Raman spectrum were correlated with the biomarkers, motivating the subsequent work in this paper. While this preliminary model lacked the validity and rigor of the other ML models in this paper, it did reveal that this problem warranted further investigation.

Lasso provides accurate testing scores with fewer selected features

Testing accuracy scores for the five different machine learning models upon 50-repeated stratified 8-fold cross-validation are shown in Figure 4. Boxplots for the distribution of the testing ${R}^{2}$ (Figure 4A) and RMSE (Figure 4B) are plotted for each of the five models. Lasso performed at least as well as the other models in both metrics, with high median ${R}^{2}$ for all biomarkers compared to the other models. %LAP⁺ had the best testing ${R}^{2}$ scores, which was expected due to the homogeneity of the %LAP⁺ values for the eight donors in Figure 1A. Furthermore, Lasso was able to achieve this high level of accuracy with a 20- or 30-dense weight vector ${\varvec{\upbeta}}$, while OLS and Ridge typically had 963-dense weight vectors. For this reason, Lasso was chosen as the primary model for predictive analysis of novel donors because of its feature selection ability. Since Lasso was able to reduce the dimensionality of the regression from 963 to a few dozen, this served two purposes: first, it reduced the risk of overfitting by ensuring that the number of selected features used for regression was less than the number of samples; and second, it allowed for the identification of the most important biological features selected for the regression. With only 20-30 features selected in Lasso, this greatly facilitated investigating the biological assignments of the selected features to look for patterns in the weight vectors.

Selected Lasso hyperparameter shows sparse weight vectors

The results of Lasso hyperparameter tuning are shown in Figure 5. The plots of ${R}^{2}$ and RMSE versus $\lambda$ showed similar trends for all four biomarkers. On the left side of the plots is the OLS limit as $\lambda \to 0$ and there was very weak regularization. The number of selected features at this end of the plots was nearly the maximum number of features, 963, for all biomarkers. As $\lambda$ increased, the number of selected features decreased and the testing ${R}^{2}$ increased while the testing RMSE decreased. For all biomarkers, the Lasso regularization appeared to increase the testing accuracy during cross-validation up to the point where about 20-30 features were selected, as indicated by the vertical, solid black lines in Figure 5. The selected $\lambda$ values, indicated by the vertical, dashed black lines in Figure 5 corresponded to a mean number of features selected of about 15-23 across the four biomarkers. On the right side of the plots in Figure 5 is the limit as $\lambda \to \infty$; in such a limit, as $\lambda$ grows boundlessly, the Lasso loss function $\mathcal{L}\left({\varvec{\upbeta}},{\beta }_{0};\lambda \right)$ will grow boundlessly and the only means of minimizing the error would be to set the 1-norm of ${\varvec{\upbeta}}$ to zero, such that the product $\lambda {\Vert {\varvec{\upbeta}}\Vert }_{1}$ remains finite. As a result, ${\varvec{\upbeta}}\to 0$ and the estimates approach a constant, ${\mathbf{Y}}^{{^{\prime}}}\to {\beta }_{0}{1}_{\text{dim}{\mathbf{Y}}^{{^{\prime}}}}$. It is for this reason that, for both the training and testing scores, the ${R}^{2}$ value becomes extremely negative and the RMSE approaches a constant as $\lambda \to \infty$.

Lasso accurately estimates novel donor biomarkers

The estimation results with the tuned Lasso model are shown in Figure 6. The trained, accepted Lasso model was used to estimate the biomarkers of the external validation donors 5 and 8. The trained weight vectors ${\varvec{\upbeta}}$ used for the estimates in Figure 6 are shown in Supplementary Tables 1-4 and described in Supplementary Section 2 of the Supplementary Information, along with the potential biological assignments of the selected features³⁵. The biomarkers %LAP⁺ and MFI GARP had the most accurate estimates of the validation donors, with ${R}^{2}>0.90$. However, both %GARP⁺ and MFI LAP had validation donor estimates with ${R}^{2}>0.75$. The biomarker %GARP⁺ had the worst accuracy, which can be attributed to its high level of donor-to-donor variability; in Figure 1B, %GARP⁺ had the most variability across donors, as evident by the behavior of donors 7 and 8 between days 7 and 9 compared to donors 1-6. The trained model was unable to reliably discern this variability with such a small sample size.

In many of the weight vectors in Supplementary Tables 1-4, there were instances of a large weight corresponding to a feature in the Raman spectrum which had a low measured spectral intensity. As a result, the magnitude of the weight vector alone was insufficient to discern the importance of the selected features in the regression. As an example, in Supplementary Table 4, RNA at 811 cm^-1 has a weight of 12461, though it only has a mean BEC of 20. Therefore, the contribution of that selected feature to the biomarker estimation is low. Other examples of features with large least-squares weights though small BECs are 485 cm^-1 for %LAP⁺ and 1053 cm^-1 for %GARP⁺.

Discussion

In this work, we presented a data-driven method to analyze cultured Tregs that have many potential therapeutic applications. This work serves to illustrate the utility of Raman spectroscopy data analysis by ML methods, particularly with a relatively limited number of Treg donor sample data. The abundant and highly complex Raman spectral data from Tregs can be parsed effectively using Lasso, with which we show practical and accurate estimates of novel samples of four Treg biomarkers. The estimations in this work were obtained using simple linear regression models. Our study has laid the foundation for subsequent future work, which would involve using more advanced machine learning and deep learning methods, such as convolution neural networks, to understand the correlation between Raman spectroscopy and cell activation biomarkers. Further work would involve applying the framework in this study to other settings. This work has only studied correlations between Raman spectroscopy and flow cytometry measurements of Treg biomarkers. The ML models used in this study are robust and generalizable such that they would be able to be implemented to study correlations of other cellular biomarkers and nuclear magnetic resonance (NMR) or other spectroscopic data. As long as high-quality training data is provided, the ML models in this work would easily be trained and would estimate novel samples with high accuracy, as shown in this work.

Here, we have demonstrated the potential of Raman spectroscopy to serve as a surrogate assay for conventional bioanalytical methods which are often destructive and require sampling, and as such are not suited to on-line analyses that are desirable in large-scale cell manufacturing processes. Validating these results with on-line Raman probes integrated into cell manufacturing devices should enable improved label-free monitoring of these processes without the need for sampling. By exporting the trained model parameters for Lasso, or another ML model, we can construct a robust workflow that can estimate different biomarkers.

Materials and methods

Cell culture

All methods were carried out in accordance with relevant guidelines and regulations. Human research was approved by the University of British Columbia Research Ethics Board (H17-01490). Informed consent was obtained from the legal guardians of the thymus donors. Thymus-derived Tregs were isolated and expanded as previously described in MacDonald et al.⁴³ Thymus donors ranged between 6 and 24 months of age and no sex discrimination was included in the analyses. In brief, thymus tissue was dissociated using the McIlwain tissue chopper (Campden Instruments Ltd., Loughborough, England) or the gentleMACS Dissociator (Miltenyi Biotec, Bergisch Gladbach, Germany), then Tregs were isolated by magnetic selection using CD25 positive selection followed by CD8 depletion (both STEMCELL Technologies, Vancouver, Canada). Isolated cells were cryopreserved in CryoStor 10 (STEMCELL Technologies) prior to expansion. Tregs were activated with Dynabeads Treg Xpander (Thermo Fisher Scientific, Waltham, MA, USA) at a 4:1 bead to cell ratio and expanded in ImmunoCult-XF T Cell Medium (STEMCELL Technologies) with 1% penicillin/streptomycin (Thermo Fisher Scientific), 1000 IU/mL recombinant human IL-2 (Proleukin, San Diego, CA, USA), and 100 ng/mL rapamycin (Sigma Aldrich, St Louis, MO, USA). Cultures were fed every 2 days starting from day 3 by adding additional media and IL-2. Additional rapamycin was added on days 3 and 5 and discontinued at day 7. Cultures were restimulated on day 11 by adding additional Dynabeads Treg Xpander at a 1:1 bead to cell ratio.

Treg samples were collected at day 0, then at 2-day intervals from day 3 of the expansion process onwards. From each sample, approximately 100-200,000 cells were used for flow cytometry and 500,000 cells saline dry-fixed for Raman spectroscopy using previously described methods⁴⁴.

Flow cytometry

Extracellular marker and fixable viability dye staining (Thermo Fisher Scientific) was performed in phosphate buffered saline with Brilliant Stain Buffer Plus (BD Biosciences, Franklin Lakes, NJ, USA), then cells were fixed, and intracellular staining was performed using the FOXP3/Transcription Factor Staining Buffer Set (Thermo Fisher Scientific). Data were acquired on a BD FACSymphony or BD LSRFortessa X-20. Analysis was performed using FlowJo version 10. The mean fluorescence intensity (MFI) reported was the geometric mean of the indicated population.

Raman spectroscopy

Raman spectra were collected using an inVia Raman microspectrometer (Renishaw, Gloucestershire, UK) equipped with a 785 nm laser that generated approximately 150 mW of power at the sample. At least 60 spectra were collected per sample, with a 10 second integration time per spectrum. Each spectrum is estimated to contain information from 10-15 cells. Raman spectra were preprocessed using an automated software suite developed in-house⁴⁵. Baseline corrected, smoothed Raman spectra were averaged and normalized to the total nucleic acid signal at approximately 782 cm^-1 for subsequent analyses.

Hyperparameter tuning

Optimal model hyperparameters were tuned with each biomarker for the five linear ML models. An internal script was written which followed the steps of the schematic in Supplementary Figure 1. For the $8$ donors, the script would randomly generate $8$ new blocks containing a random day 5, day 7, day 9, day 11, day 13, and day 15 spectrum. Then, the script performed 5-repeated 8-fold cross-validation by iterating through 8 epochs, in which for each epoch $i$, block $i$ would be chosen as the testing set and the remaining 7 blocks would be used as the training set. This process of 8-fold cross validation was run for 200 values of the hyperparameters within a chosen interval. For each epoch, training was performed by using the scikit-learn package in Python. The model performance metrics ${R}^{2}$ (coefficient of determination) and RMSE (root mean-square error) were determined for the training and testing biomarker values and their estimates using scikit-learn functions. Estimates were calculated using the scikit-learn functions with the trained model. For each hyperparameter value, $N=40$ cross-validation scores were obtained for all epochs over all repeats. The hyperparameter which had the highest corresponding mean testing ${R}^{2}$, denoted as ${\lambda }_{max}$ for Lasso and written similarly for the other models, was found. The standard error was calculated as $\sigma /\sqrt{N}$, where $\sigma$ is the standard deviation of the scores. The hyperparameters of Ridge, PLS, LSVR were selected to be the calculated maximum mean testing ${R}^{2}$ value. For Lasso, however, in order to decrease the number of features selected, the selected Lasso hyperparameter was the largest value satisfying $\lambda > {\lambda }_{max}$ and which had a mean testing ${R}^{2}$ larger than the lower 95% confidence interval of the ${R}^{2}$ scores for ${\lambda }_{max}$. The five machine-learning models were then evaluated with optimal hyperparameters by 50-repeated stratified 8-fold cross-validation. The distributions of testing ${R}^{2}$ and RMSE were compared for model selection.

Model training

The validation set was constructed by selecting one of donors 1-6 and one of donors 7-8, both at random. The remaining donors then comprised the model-building set (Supplementary Figure 2). The model training was performed in Python using scikit-learn functions. The model-building set was randomly split into training and testing sets, which were 70% and 30% of the original set, respectively, and the Lasso model was then trained with the training set, after which the estimates of the training and testing $\mathbf{Y}$ vectors were obtained. For analysis of the training and testing estimates compared to the true biomarker values, the metrics ${R}^{2}$ and RMSE were determined. The model was only accepted if the ${R}^{2}$ between the measured testing biomarker values of the model-building set, ${\mathbf{Y}}_{test}$, and the corresponding estimated testing set biomarker values, ${\mathbf{Y}}'_{test}$, was ${R}^{2}>0.90$; otherwise, the random splitting and training were repeated. This artificially induced model-selection step was performed to ameliorate the issue of low sample sizes which exist in works with clinical samples, including this work. The purpose of this model-selection step was to show that there exist some model parameters which can be obtained from the model-building set donors, which are able to be exported and used for robust estimations of biomarker values in novel settings. Specifically, this selection was done to find the best parameters given the small sample size. The accepted trained model was then used to estimate the biomarker values of the validation set. Thus, we implemented the model-validation step after model-selection to ensure the accuracy and robustness of the final model despite this artificial selectivity.

Data availability

The data and code used to generate the figures in this work is available on a public repository at https://github.com/daisubc.

References

Tarannum, M. et al. Engineering innate immune cells for cancer immunotherapy. Nat. Biotechnol. 43, 516–533. https://doi.org/10.1038/s41587-025-02629-5 (2025).
Article CAS PubMed Google Scholar
Waldman, A. D., Fritz, J. M. & Lenardo, M. J. A guide to cancer immunotherapy: From T cell basic science to clinical practice. Nat. Rev. Immunol. 20, 651–668. https://doi.org/10.1038/s41577-020-0306-5 (2020).
Article CAS PubMed PubMed Central Google Scholar
Ferrer, G., Álvarez-Errico, D. & Esteller, M. Biological and molecular factors predicting response to adoptive cell therapies in cancer. JNCI 114, 930–939. https://doi.org/10.1093/jnci/djac088 (2022).
Article CAS PubMed PubMed Central Google Scholar
Salz, L. et al. Culture expansion of CAR T cells results in aberrant DNA methylation that is associated with adverse clinical outcome. Leukemia 37, 1868–1878. https://doi.org/10.1038/s41375-023-01966-1 (2023).
Article CAS PubMed PubMed Central Google Scholar
Sun, L. et al. T cells in health and disease. Signal Transduct Target Ther. 8, 235. https://doi.org/10.1038/s41392-023-01471-y (2023).
Article CAS PubMed PubMed Central Google Scholar
Iriguchi, S. et al. A clinically applicable and scalable method to regenerate T-cells from iPSCs for off-the-shelf T-cell immunotherapy. Nat. Commun. 12, 430. https://doi.org/10.1038/s41467-020-20658-3 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Wülfing, C., Sjaastad, M. D. & Davis, M. M. Visualizing the dynamics of T cell activation: Intracellular adhesion molecule 1 migrates rapidly to the T cell/B cell interface and acts to sustain calcium levels. Proc. Natl. Acad. Sci. USA 95, 6302–6307. https://doi.org/10.1073/pnas.95.11.6302 (1998).
Article ADS PubMed PubMed Central Google Scholar
Li, A. W. et al. Engineering potent chimeric antigen receptor T cells by programming signaling during T-cell activation. Sci. Rep. 14, 21331. https://doi.org/10.1038/s41598-024-72392-1 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhang, D. K. Y. et al. Subcutaneous biodegradable scaffolds for restimulating the antitumour activity of pre-administered CAR-T cells. Nat. Biomed. Eng. 9, 268–278. https://doi.org/10.1038/s41551-024-01216-4 (2025).
Article CAS PubMed Google Scholar
Chih, Y. C. et al. Vaccine-induced T cell receptor T cell therapy targeting a glioblastoma stemness antigen. Nat. Commun. 16, 1262. https://doi.org/10.1038/s41467-025-56547-w (2025).
Article ADS CAS PubMed PubMed Central Google Scholar
VelascoCárdenas, R. M. H. et al. Harnessing CD3 diversity to optimize CAR T cells. Nat. Immunol. 24, 2135–2149. https://doi.org/10.1038/s41590-023-01658-z (2023).
Article CAS Google Scholar
Simon, S. et al. Design of sensitive monospecific and bispecific synthetic chimeric T cell receptors for cancer therapy. Nat. Cancer https://doi.org/10.1038/s43018-025-00927-0 (2025).
Article PubMed PubMed Central Google Scholar
Park, H. B. et al. Improved safety of chimeric antigen receptor T cells indirectly targeting antigens via switchable adapters. Nat. Commun. 15, 9917. https://doi.org/10.1038/s41467-024-53996-7 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Pavillon, N. & Smith, N. I. Non-invasive monitoring of T cell differentiation through Raman spectroscopy. Sci. Rep. 13, 3129. https://doi.org/10.1038/s41598-023-29259-8 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Ramoji, A. et al. Leukocyte activation profile assessed by raman spectroscopy helps diagnosing infection and sepsis. Crit. Care Explor. 3, e0394. https://doi.org/10.1097/CCE.0000000000000394 (2021).
Article PubMed PubMed Central Google Scholar
Wing, J. B., Tanaka, A. & Sakaguchi, S. Human FOXP3+ regulatory T cell heterogeneity and function in autoimmunity and cancer. Immunity 50, 302–316. https://doi.org/10.1016/j.immuni.2019.01.020 (2019).
Article CAS PubMed Google Scholar
Wardell, C. M., MacDonald, K. N., Levings, M. K. & Cook, L. Cross talk between human regulatory T cells and antigen-presenting cells: Lessons for clinical applications. Eur. J. Immunol. 51, 27–38. https://doi.org/10.1002/eji.202048746 (2021).
Article CAS PubMed Google Scholar
Sakaguchi, S. et al. Regulatory T cells and human disease. Annu. Rev. Immunol. 38, 541–566. https://doi.org/10.1146/annurev-immunol-042718-041717 (2020).
Article CAS PubMed Google Scholar
Romano, M., Fanelli, G., Albany, C. J., Giganti, G. & Lombardi, G. Past, present, and future of regulatory T cell therapy in transplantation and autoimmunity. Front. Immunol. 10, 43. https://doi.org/10.3389/fimmu.2019.00043 (2019).
Article CAS PubMed PubMed Central Google Scholar
Schmidt, A., Oberle, N. & Krammer, P. H. Molecular mechanisms of treg-mediated T cell suppression. Front. Immunol. 3, 51. https://doi.org/10.3389/fimmu.2012.00051 (2012).
Article CAS PubMed PubMed Central Google Scholar
Raffin, C., Vo, L. T. & Bluestone, J. A. Treg cell-based therapies: challenges and perspectives. Nat. Rev. Immunol. 20, 158–172. https://doi.org/10.1038/s41577-019-0232-6 (2020).
Article CAS PubMed Google Scholar
Mohseni, Y. R. et al. The future of regulatory T cell therapy: Promises and challenges of implementing CAR technology. Front. Immunol. 11, 1608. https://doi.org/10.3389/fimmu.2020.01608 (2020).
Article CAS PubMed PubMed Central Google Scholar
Proics, E. et al. Preclinical assessment of antigen-specific chimeric antigen receptor regulatory T cells for use in solid organ transplantation. Gene Ther. 30, 309–322. https://doi.org/10.1038/s41434-022-00358-x (2023).
Article CAS PubMed Google Scholar
Oh, S. A. & Li, M. O. TGF-β: Guardian of T cell function. J. Immunol. 191, 3973–3979. https://doi.org/10.4049/jimmunol.1301843 (2013).
Article CAS PubMed Google Scholar
Walton, K. L. et al. Two distinct regions of latency-associated peptide coordinate stability of the latent transforming growth factor-beta1 complex. J. Biol. Chem. 285, 17029–17037. https://doi.org/10.1074/jbc.M110.110288 (2010).
Article CAS PubMed PubMed Central Google Scholar
Sun, L., Jin, H. & Li, H. GARP: A surface molecule of regulatory T cells that is involved in the regulatory function and TGF-β releasing. Oncotarget 7, 42826–42836. https://doi.org/10.18632/oncotarget.8753 (2016).
Article PubMed PubMed Central Google Scholar
MacDonald, K. N. et al. Cryopreservation timing is a critical process parameter in a thymic regulatory T-cell therapy manufacturing protocol. Cytotherapy 21, 1216–1233. https://doi.org/10.1016/j.jcyt.2019.10.011 (2019).
Article CAS PubMed PubMed Central Google Scholar
Putnam, A. L. et al. Clinical grade manufacturing of human alloantigen-reactive regulatory T cells for use in transplantation. Am. J. Transplant. 13, 3010–3020. https://doi.org/10.1111/ajt.12433 (2013).
Article CAS PubMed PubMed Central Google Scholar
Jones, R. R., Hooper, D. C., Zhang, L., Wolverson, D. & Valev, V. K. Raman techniques: Fundamentals and frontiers. Nanoscale Res. Lett. 14, 231. https://doi.org/10.1186/s11671-019-3039-2 (2019).
Article ADS PubMed PubMed Central Google Scholar
Smekal, A. Zur quantentheorie der dispersion. Naturwissenschaften 11, 873–875. https://doi.org/10.1007/BF01576902 (1923).
Article ADS CAS Google Scholar
Raman, C. V. & Krishnan, K. S. A new type of secondary radiation. Nature 121, 501–502. https://doi.org/10.1038/121501c0 (1928).
Article ADS CAS Google Scholar
Talari, A. C. S., Movasaghi, Z., Rehman, S. & Rehman, I. U. Raman spectroscopy of biological tissues. Appl. Spectrosc. Rev. 50, 46–111. https://doi.org/10.1080/05704920701551530 (2015).
Article ADS CAS Google Scholar
Cao, Z. et al. Continuous improvement of self-driving cars using dynamic confidence-aware reinforcement learning. Nat. Mach. Intell. 5, 145–158. https://doi.org/10.1038/s42256-023-00610-y (2023).
Article Google Scholar
Schrittwieser, J. et al. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588, 604–609. https://doi.org/10.1038/s41586-020-03051-4 (2020).
Article ADS CAS PubMed Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589. https://doi.org/10.1038/s41586-021-03819-2 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Blake, N., Gaifulina, R., Griffin, L. D., Bell, I. M. & Thomas, G. Machine learning of raman spectroscopy data for classifying cancers: A review of the recent literature. Diagnostics 12, 1491. https://doi.org/10.3390/diagnostics12061491 (2022).
Article PubMed PubMed Central Google Scholar
Zhang, L. et al. Raman spectroscopy and machine learning for the classification of breast cancers. Spectrochim. Acta A Mol. Biomol. Spectrosc. 264, 120300. https://doi.org/10.1016/j.saa.2021.120300 (2022).
Article CAS PubMed Google Scholar
Tang, J. W. et al. Comparative analysis of machine learning algorithms on surface enhanced Raman spectra of clinical Staphylococcus species. Front. Microbiol. 12, 696921. https://doi.org/10.3389/fmicb.2021.696921 (2021).
Article PubMed PubMed Central Google Scholar
Schulze, H. G. et al. Augmented two-dimensional correlation spectroscopy for the joint analysis of correlated changes in spectroscopic and disparate sources. Appl. Spectrosc. 75, 520–530. https://doi.org/10.1177/0003702820979331 (2021).
Article ADS CAS PubMed Google Scholar
Geladi, P. Notes on the history and nature of partial least squares (PLS) modelling. J. Chemom. 2, 231–246. https://doi.org/10.1002/cem.1180020403 (1988).
Article Google Scholar
Klopfenstein, Q. & Vaiter, S. Linear support vector regression with linear constraints. Mach. Learn. 110, 1939–1974. https://doi.org/10.1007/s10994-021-06018-2 (2021).
Article MathSciNet Google Scholar
Pavillon, N. & Smith, N. I. Deriving accurate molecular indicators of protein synthesis through Raman-based sparse classification. Analyst 146, 3633–3641. https://doi.org/10.1039/d1an00412c (2021).
Article ADS CAS PubMed Google Scholar
MacDonald, K. N. et al. Consequences of adjusting cell density and feed frequency on serum-free expansion of thymic regulatory T cells. Cytotherapy 24, 1121–1135. https://doi.org/10.1016/j.jcyt.2022.06.006 (2022).
Article CAS PubMed Google Scholar
Rangan, S. et al. Saline dry fixation for improved cell composition analysis using Raman spectroscopy. Analyst 148, 2745–2757. https://doi.org/10.1039/D2AN01916G (2023).
Article ADS CAS PubMed Google Scholar
Schulze, H. G., Rangan, S., Piret, J. M., Blades, M. W. & Turner, R. F. B. Developing fully automated quality control methods for preprocessing raman spectra of biomedical and biological samples. Appl. Spectrosc. 72, 1322–1340. https://doi.org/10.1177/0003702818778031 (2018).
Article ADS CAS PubMed Google Scholar

Download references

Acknowledgements

The authors thank volunteers, patients, and their parents for contribution of samples, as well as the surgical and cardiac clinic staff at the British Columbia (BC) Children’s Hospital who made this study possible. The authors also thank the BC Children’s Hospital Research Institute Flow Core Facility for technical support. The authors thank Nikesh Patel and Robin F. B. Turner for helpful discussions and comments during analysis of data and methodology development. This work was supported in part by the BioTalent Student Work Placement Program, the UBC Genome Science and Technology Summer Studentship, Natural Sciences and Engineering Research Council (NSERC) of Canada Undergraduate Student Research Awards (USRA), the UBC Work Learn program, and the Canadian Institutes of Health Research (ICC-176446). Megan K. Levings receives a salary award from the BC Children’s Hospital Research Institute and holds a Tier 1 Canada Research Chair in Engineered Immune Tolerance.

Author information

Authors and Affiliations

Center for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
Aria Azari-Pour
Michael Smith Laboratories, University of British Columbia, 2185 E Mall, Vancouver, BC, V6T 1Z4, Canada
Aria Azari-Pour, Ali Chamkalani, Shreyas Rangan, Katherine N. MacDonald, Miles Huynh & James M. Piret
Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of British Columbia, Life Sciences Centre, 2350 Health Sciences Mall, Vancouver, BC, V6T 2A1, Canada
Aria Azari-Pour
Department of Chemical and Biological Engineering, Faculty of Applied Science, University of British Columbia, 2360 East Mall, Vancouver, BC, V6T 1Z3, Canada
Miles Huynh, James M. Piret & Bhushan Gopaluni
School of Biomedical Engineering, University of British Columbia, 2222 Health Sciences Mall, Vancouver, BC, V6T 2B9, Canada
Ali Chamkalani, Shreyas Rangan, Katherine N. MacDonald, Megan K. Levings & James M. Piret
BC Children’s Hospital Research Institute, 938 W 28th Ave, Vancouver, BC, V5Z 4H4, Canada
Katherine N. MacDonald & Megan K. Levings
Department of Surgery, Faculty of Medicine, University of British Columbia, 2775 Laurel Street, Vancouver, BC, V5Z 1M9, Canada
Megan K. Levings
5823 Schooner Way, Pender Island, BC, V0N 2M0, Canada
H. Georg Schulze

Authors

Aria Azari-Pour
View author publications
Search author on:PubMed Google Scholar
Ali Chamkalani
View author publications
Search author on:PubMed Google Scholar
Shreyas Rangan
View author publications
Search author on:PubMed Google Scholar
Katherine N. MacDonald
View author publications
Search author on:PubMed Google Scholar
Miles Huynh
View author publications
Search author on:PubMed Google Scholar
Megan K. Levings
View author publications
Search author on:PubMed Google Scholar
H. Georg Schulze
View author publications
Search author on:PubMed Google Scholar
James M. Piret
View author publications
Search author on:PubMed Google Scholar
Bhushan Gopaluni
View author publications
Search author on:PubMed Google Scholar

Contributions

A. A., A. C., S. R., J. M. P., and B. G. conceptualized the project and designed the methods for computational analysis. M. H. performed preliminary MuSCA analyses with input from S. R. and H. G. S. that were used as a baseline for the subsequent machine learning methods. K. N. M. processed donor thymuses and performed Treg culture and flow cytometry experiments. S. R. performed Raman spectroscopy and spectral processing. M. K. L. obtained funding, supervised experimental research, and provided critical resources. A. A. carried out the biomarker and spectral processing, machine learning, and computational analyses. A. C., H. G. S. and B. G. supervised the computational analyses. A. A. drafted the main manuscript text and prepared the figures. All authors reviewed the manuscript prior to submission.

Corresponding authors

Correspondence to Aria Azari-Pour or Bhushan Gopaluni.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics approval

All methods in this study were carried out in accordance with relevant guidelines and regulations. Experimental cell work was performed with approval from the University of British Columbia Research Ethics Board (H17-01490). Informed consent was obtained from the legal guardians of the thymus donors. Thymus donors were anonymized and referred to by unidentifiable donor numbers throughout this study.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Azari-Pour, A., Chamkalani, A., Rangan, S. et al. Label-free estimation of regulatory T cell activation markers using Raman spectroscopy with machine learning. Sci Rep 15, 38519 (2025). https://doi.org/10.1038/s41598-025-16002-8

Download citation

Received: 12 June 2025
Accepted: 12 August 2025
Published: 04 November 2025
Version of record: 04 November 2025
DOI: https://doi.org/10.1038/s41598-025-16002-8

Label-free estimation of regulatory T cell activation markers using Raman spectroscopy with machine learning

Abstract