Introduction

The implementation of self-driving laboratories in materials science is a promising approach to accelerate material discovery and optimization1,2,3. In physical vapor deposition (PVD) of thin-film materials, the traditional human-led process encompasses numerous cycles of selecting deposition parameters, performing deposition, characterizing film properties, and re-adjusting deposition parameters accordingly. The field eagerly needs to harness the capabilities of machine learning (ML) and robotics to streamline and accelerate this iterative process.

Several studies have sought to integrate ML with the PVD process. Typically, these approaches involved training ML models that map deposition parameters, such as substrate temperature, deposition rate, and flux ratio, to material properties, such as stoichiometry4,5,6, electrical conductivity7,8,9, surface morphology10,11, crystallinity12,13,14,15,16,17, and superconducting critical temperatures18. The trained models are then used to predict material properties, with Bayesian optimization (BO) frequently employed to autonomously determine the deposition parameters for subsequent samples9,19,20,21,22,23. Yet, PVD is intrinsically sensitive to subtle differences in substrate conditions and chamber environments, challenging the notion of a definitive mapping between deposition parameters and sample properties, and making traditional BO-based models difficult to implement8,18. This calls for sample-specific, on-the-fly decision making to determine the optimal deposition condition.

Beyond the incorporation of ML algorithms, the realization of self-driving PVD hinges on the complete automation of hardware systems. PVD systems require high vacuum (HV) or ultra-high vacuum (UHV) environments, which present significant challenges for fully automating sample transfer and characterization processes. As such, most studies in the field of ML-assisted thin-film deposition still rely on traditional manual handling of samples, which limits sample throughput and hinders the realization of the fully autonomous PVD9,18,24. Shimizu et al. demonstrated a system fully automating the deposition of Nb-doped TiO2 films and the minimization of its resistance25. However, the system requires a complex multi-chamber setup with sophisticated transfer mechanisms, thereby increasing the complexity of the setup and limiting its large-scale deployment. Harris et al. developed an automated pulsed laser deposition (PLD) system with a wheel-like sample holder that holds up to 10 samples, though such number of samples is insufficient for high throughput experiments and requires human intervention for reloading samples23. Therefore, achieving self-driving PVD systems with a streamlined setup is crucial for the development of the field26.

In this work, we extend the concept of ML-aided PVD by developing a fully self-driving PVD platform. Our system integrates a UHV chamber with a 72-slot robotic sample handling system, in-situ optical characterization, and machine learning into a closed-loop workflow. We demonstrate the autonomous deposition of silver thin films with optical reflectivities deviating from user-specified targets by less than 0.025 in an average of 2.3 attempts. Additionally, the platform employs a calibration strategy that systematically captures the effect of fluctuations in the deposition conditions on the film properties. It further uncovers nuanced relationships, such as those between effusion cell temperature and sample absorptivity, and explores the extent to which the optical spectrum can be engineered. These results validate a scalable self-driving PVD platform and highlight the transformative potential of self-driving laboratories to drastically accelerate material discovery and optimization.

Results

System design

To demonstrate the principles of self-driving PVD, we seek to fabricate silver thin films with user-specified optical properties. The optical properties of silver thin films are extremely sensitive to deposition parameters. In general, effusion cell temperature sets the arrival rate of adatoms; deposition time determines the film thickness and microstructures. Reflectivity rises with thickness and approaches the bulk limit as the film becomes optically opaque27. Ultra-thin silver first nucleates as isolated islands that gradually coalesce into a continuous layer as the thickness increases, shifting the effective dielectric function during this process28. Higher cell temperatures increase adatom flux, shortening surface-relaxation times, leading to a smaller void fraction and a larger extinction coefficient29. Grain size further tunes free-electron scattering and thus both the real and imaginary parts of the refractive index30,31. Therefore, it is difficult to model all of these mechanisms using simple physical laws, which warrants an ML-driving material optimization and makes it an ideal testbed for self-driving PVD.

The automated PVD system with in-situ optical characterization capabilities is described in Methods (Fig. 1a). The deposition parameters are effusion cell temperature (T) and deposition time (t). The transmitted (Pt) and reflected power (Pr) of the deposited silver thin films are measured, and we define the reflected power ratio \({\mathscr{R}}\) and absorptivity \({\mathscr{A}}\):

$${\mathscr{R}}=\frac{{P}_{r}}{{P}_{r}+{P}_{t}},\quad {\mathscr{A}}=\frac{{P}_{i}-{P}_{r}-{P}_{t}}{{P}_{i}}$$

where Pi denotes the incident power. For convenience, the reflected power ratio and the absorptivity at 443 nm are denoted as \({{\mathscr{R}}}_{443}\) and \({{\mathscr{A}}}_{443}\), respectively, and similarly for other wavelengths. To maximize data acquisition efficiency, during deposition, the system cycles through each wavelength in sequence and performs measurement at each wavelength (Supplementary Fig. 1). Each complete cycle takes 98 seconds, as determined by the linear rail speed and a 5-second measurement period per wavelength.

Fig. 1: A self-driving physical vapor deposition system for silver thin-film deposition.
figure 1

a The active learning cycle incorporates (i) identification of the deposition condition with the highest model uncertainty, (ii) sample deposition, and (iii) updating the model with new data. b The adaptive testing cycle incorporates (i) prediction of the optimal deposition condition that minimizes the loss function, (ii) sample deposition, and (iii) assessment of success and model updating. c Schematic illustration of the autonomous deposition setup featuring robotic sample handling and in-situ optical characterization.

All hardware control and data acquisition are managed by MATLAB scripts which allow the system to deposit and characterize up to 72 samples consecutively without human intervention. The optical characterization data are fed into the ML algorithm, which predicts the deposition condition required for the sample to attain user-specified \({\mathscr{R}}\) values at targeted wavelengths. Our optical measurement workflow differs from previous thin film deposition works that have implemented in-situ, real time optical reflectivity measurements, as their measurement results served as monitor of the film thickness23 or provided parameters for solving equations that govern the deposition process32 and were not used to train a ML model.

Calibration layer

In thin-film deposition, slight variations in substrate and chamber conditions can significantly affect deposition dynamics33. These parameters, such as substrate surface roughness and composition of the chamber residual pressure, cannot be measured exhaustively and can cause irreproducibility in the thin-film deposition process8,18. Since noise in the training data can significantly degrade ML model performance, the field of ML-assisted thin-film deposition needs a systematic approach to effectively account for these “hidden parameters”.

To account for this variability, we introduce a physical calibration layer. For each sample, we first deposit a calibration layer under a set of universal conditions: effusion cell temperature of 875 °C and a deposition time of 1000 seconds. This initial layer, approximately 5 nm thick, serves as a probe of the intrinsic variations in the deposition condition. By measuring the \({{\mathscr{R}}}_{689}\) of this layer, denoted as \({{\mathscr{R}}}_{c}\), we obtain a quantitative indicator that partially captures the effect of the hidden parameters, and enables the self-driving system to adapt to the specific substrate and chamber conditions on-the-fly. Figure 2 schematically illustrates this two-stage process, first the calibration layer deposition, then followed by the primary deposition with measurements at alternating wavelengths. Table 1 provides a snippet of the dataset structure. Note that \({{\mathscr{R}}}_{c}\), though a measured value, is treated as an input parameter of the model in the dataset. For further details and performance assessment of this approach, please refer to Supplementary Information Section 2.

Fig. 2: Illustration of the data acquisition process for each iteration in the active learning stage.
figure 2

The calibration layer is first deposited to determine the \({{\mathscr{R}}}_{c}\), followed by measuring optical properties at all 5 wavelengths in cycles.

Table 1 Exemplary dataset

Machine learning setup

We employ Gaussian Process Regression (GPR), a flexible, non-parametric, and probabilistic machine learning method, for mapping the deposition parameters to the optical properties of the silver thin films. Unlike conventional regression techniques that output a single predicted value, GPR provides a probabilistic framework that estimates both the predicted mean and the associated uncertainty34. Denoting the GPR output as

$${{\rm{GPR}}}_{y}({\bf{x}})=\left({\mu }_{y}({\bf{x}}),\,{\sigma }_{y}({\bf{x}})\right),$$

we refer to μy as the “predicted mean” and σy as the “predicted uncertainty.” Two separate models, \({{\rm{GPR}}}_{{\mathscr{R}}}\) and \({{\rm{GPR}}}_{{\mathscr{A}}}\), are trained for reflectance \({\mathscr{R}}\) and absorbance \({\mathscr{A}}\), respectively, each employing a Radial Basis Function (RBF) kernel. In both cases, the input vector is

$${\bf{x}}=\left(T,\,t,\,\lambda ,\,{{\mathscr{R}}}_{c}\right),$$

where T is deposition temperature, t is deposition time, λ is incident wavelength, and \({{\mathscr{R}}}_{c}\) represents the calibration-layer reflectance. The corresponding model outputs are \(({\mu }_{{\mathscr{R}}},{\sigma }_{{\mathscr{R}}})\) and \(({\mu }_{{\mathscr{A}}},{\sigma }_{{\mathscr{A}}})\).

Our approach adopts the explore-then-commit strategy widely used in BO35,36,37 and is divided into two stages: active learning and adaptive testing. In the active learning stage, the ground-truth growth conditions and optical characterization data are used to train ML models, which then determine the next point to explore according to the model uncertainty (Fig. 1a). This process efficiently navigates the complex input space. In the adaptive testing stage, the trained model guides the system to fabricate silver thin films with user-specified optical properties and constantly adapts to its own prediction errors (Fig. 1b). The model is continuously updated with testing data, enabling ongoing refinement and improved predicting accuracy.

Active learning

A series of pre-training with 9 samples is first done to initialize the model, during which the deposition parameter T is uniformly sampled within the range of [820, 880] °C and deposited for time up to

$$t_{{\max}}(T) = 3.22 \times 10^{14} \times e^{-0.0285T (^{\circ} C)} ({\text{seconds}})$$
(1)

where \({t}_{\max }\) is determined such that each sample reaches a thickness yielding an \({\mathscr{R}} > 0.8\) for all wavelengths. This ensures the generation of data over a wide range of \({\mathscr{R}}\) values for training the model, while also avoiding the unnecessary time spent depositing films until \({\mathscr{R}}\) asymptotically approaches 1. The functional form of \({t}_{\max }(T)\) is motivated by the fact that vapor pressure (and hence the deposition rate) increases exponentially with temperature.

The system proceeds to the active learning stage. After each sample’s calibration layer is deposited and \({{\mathscr{R}}}_{c}\) is measured, T for the subsequent deposition is selected within the range of [820, 880] °C according to:

$${T}_{{\rm{selected}}}=\arg \mathop{\max }\limits_{T}\left\{\overline{{\sigma }_{{\mathscr{R}}}}(T,{{\mathscr{R}}}_{c})\right\}.$$
(2)

where

$$\overline{{\sigma }_{{\mathscr{R}}}}(T,{{\mathscr{R}}}_{c})=\frac{\sqrt{\mathop{\sum }\nolimits_{t = 0}^{{t}_{\max }(T)}{\sum }_{\lambda }{\sigma }_{{\mathscr{R}}}{(T,{{\mathscr{R}}}_{c},t,\lambda )}^{2}}}{{t}_{\max }(T)}.$$
(3)

Here, \(\overline{{\sigma }_{{\mathscr{R}}}}(T,{{\mathscr{R}}}_{c})\) represents the uncertainty of \({\mathscr{R}}\) averaged over time and wavelengths. Since data is collected over time and wavelengths, as depicted in Fig. 2, we reduce the dimensionality of \({\sigma }_{{\mathscr{R}}}(T,{{\mathscr{R}}}_{c},t,\lambda )\) into \(\overline{{\sigma }_{{\mathscr{R}}}}(T,{{\mathscr{R}}}_{c})\). The 98-second data collection interval is relatively short so that it is not a concern to miss a point of high interest. Moreover, \({{\mathscr{R}}}_{c}\) is a measured variable reflecting the substrate and chamber conditions and cannot be determined by the model. Hence, this BO problem, originally being complex 4-dimensional, is effectively reduced to a 1-dimensional constraint optimization problem. The model selects the value of T that maximizes \(\overline{{\sigma }_{{\mathscr{R}}}}\), thereby capturing information at the point in the T space with the greatest uncertainty. The sample is then deposited at Tselected and measured at a series of times until \(t={t}_{\max }({T}_{{\rm{selected}}})\), specified by Eq. (1).

Figure 3 (a) shows the evolution of the \(\overline{{\sigma }_{{\mathscr{R}}}}\) during active learning. After 8 iterations, \(\overline{{\sigma }_{{\mathscr{R}}}}\) over the entire parameter space becomes more uniformly distributed and has an average value of 0.032. The uncertainty does not further decrease below this level. The remaining uncertainty is likely due to the hidden parameters unaccounted by the calibration layer as well as measurement noises (e.g., uncertainty in laser power measurements). The maximum of \(\overline{{\sigma }_{{\mathscr{R}}}}\) in the parameter space converges to 0.056 after 8 iterations (Fig. 3(b)), signaling the appropriate point to terminate the active learning process.

Fig. 3: Performance of active learning in the fabrication of silver thin films.
figure 3

a Evolution of \(\overline{{\sigma }_{{\mathscr{R}}}}\) in the parameter space during active learning. b Convergence of the maximum \(\overline{{\sigma }_{{\mathscr{R}}}}\) during active learning. c Comparing the model prediction errors, defined as the differences between all measured and predicted \({{\mathscr{R}}}_{\lambda }\)'s. The error distribution when adopting active learning and the calibration layer is benchmarked against the case without these techniques.

Since the goal of the system is to achieve arbitrary \({\mathscr{R}}\) requests at the specified wavelengths, we employ active learning to comprehensively sample the full parameter space. Since every target \({\mathscr{R}}\) falls within the range the model has already explored, acquisition functions that seek to push the search outside this range (e.g., expected improvement) are not appropriate for this task38,39.

Since the goal of the system is to achieve arbitrary \({\mathscr{R}}\) requests at the specified wavelengths, we take an active learning approach to acquire comprehensive information over the entire parameter space. Moreover, as the requested \({\mathscr{R}}\) would always be within the large range of \({\mathscr{R}}\) that the model has been exposed to, it is not suitable to apply BO with acquisition functions that aim to optimize parameters beyond the existing knowledge38,39.

Adaptive testing: single-wavelength \({\mathscr{R}}\) targets

We select 5 random single-wavelength targets \({{\mathscr{R}}}_{\lambda }^{{\rm{target}}}\), one for each wavelength of the lasers. Since given a certain \({{\mathscr{R}}}_{c}\), there exist infinitely many (T, t) that can achieve the requested \({\mathscr{R}}\) at the specified wavelength. The degeneracy is removed by also aiming for the minimum \({{\mathscr{A}}}_{\lambda }\). The loss function for each single-wavelength target is defined as

$$\begin{array}{rcl}{{\mathscr{L}}}_{\lambda }&=&{\left({\mu }_{{\mathscr{R}},\lambda }-{{\mathscr{R}}}_{\lambda }^{{\rm{target}}}\right)}^{2}+4{\sigma }_{{\mathscr{R}},\lambda }^{2}+{\mu }_{{\mathscr{A}},\lambda }^{2}+4{\sigma }_{{\mathscr{A}},\lambda }^{2}\\ &&\,\text{if}\,| {\mu }_{{\mathscr{R}},\lambda }-{{\mathscr{R}}}_{\lambda }^{{\rm{target}}}| > 0.01:\\ &&{{\mathscr{L}}}_{\lambda }\mapsto {{\mathscr{L}}}_{\lambda }+100{\left({\mu }_{{\mathscr{R}},\lambda }-{{\mathscr{R}}}_{\lambda }^{{\rm{target}}}\right)}^{2}\end{array}$$
(4)

The uncertainties of predicted \({\mathscr{R}}\) and \({\mathscr{A}}\) are added to the loss function to penalize deposition conditions with high uncertainties. The loss increases rapidly when the difference between the target and predicted \({\mathscr{R}}\) exceeds 0.01 to prioritize proximity to the target value of \({\mathscr{R}}\).

The loss function is minimized using the Adam optimizer40. The set of deposition parameters (T, t) at the minimum of the loss function is used for deposition. After deposition, the model is updated with the new measurement data, and the sample’s measured \({{\mathscr{R}}}_{\lambda }\) is compared to \({{\mathscr{R}}}_{\lambda }^{{\rm{target}}}\). If

$$| {{\mathscr{R}}}_{\lambda }^{{\rm{measured}}}-{{\mathscr{R}}}_{\lambda }^{{\rm{target}}}| < 0.025$$
(5)

the deposition is considered successful, and the system moves to the next target. If unsuccessful, the model adapts to the new data and re-attempts the target until success. The threshold of 0.025 is determined because this level of accuracy is sufficient for most applications and closely matches our model’s predictive performance (mean absolute error in pre-training cross-validation ≈ 0.029, see Supplementary Information Fig. 3), making it both practically meaningful and realistically attainable.

Table 2 displays the results for 5 single-wavelength targets. It took 2 attempts on average for a target to be successfully achieved. For each of the 10 samples deposited during this stage, the model makes 5 predictions on its \({\mathscr{R}}\) for each wavelength. For these 50 total predictions, the mean absolute error (MAE) between the model predictions and measured results is 0.0246, which demonstrates the accuracy of the prediction. Moreover, the average model predicted uncertainty over the parameter space is 0.0267, and its proximity to the MAE demonstrates the accuracy of the model’s estimate of its uncertainty.

Table 2 Adaptive testing results

To benchmark the effectiveness of the calibration layer and active learning, we perform a control experiment without these methods. In the control experiment, the model only goes through the pre-training process, in which 17 samples are grown at T between 820 and 880°C with 3.75 °C increments, and for t as specified in Eq. (1). The system is then requested to produce silver thin films that satisfy the same 5 single-wavelength targets. Despite the control experiment having the same amount of training sample data as the previous experiment, it requires on average 3.6 attempts to successfully achieve each target. The MAE between the model predictions and measured results is 0.0618, also significantly increased from the previous experiment (Fig. 3(c)). The control experiment hence demonstrates the superior performance of the model using calibration layers and active learning.

With strong predictive power, the trained models provide insights into the optimal deposition strategies that are otherwise difficult to deduce via human trial-and-error or physical intuition. These models can be used to explore various high-dimensional constrained optimization problems. As an illustration, we attempt to build on the single-wavelength target testing and generalize the strategy of choosing the effusion cell temperature T that minimizes \({{\mathscr{A}}}_{\lambda }\) for any \({{\mathscr{R}}}_{c}\), \({{\mathscr{R}}}_{\lambda }^{{\rm{target}}}\) and its associated λ. Our results show that the optimal temperature for minimizing \({{\mathscr{A}}}_{\lambda }\) varies significantly with the varying \({{\mathscr{R}}}_{c}\). At lower \({{\mathscr{R}}}_{c}\) values, the model consistently predicts 880°C to be the optimal temperature (Fig. 4). However, at higher \({{\mathscr{R}}}_{c}\) values, the trend of optimal temperature versus \({{\mathscr{R}}}_{c}\) is more complex and depends on the \({{\mathscr{R}}}_{\lambda }^{{\rm{target}}}\) and its associated λ. As the \({{\mathscr{R}}}_{c}\) increases, the optical temperature could increase (Fig. 4(a)), be relatively constant (Fig. 4(b)), decrease then increase (Fig. 4(c)), or decrease then remain constant (Fig. 4(d)). Note that T physically mostly reflects the deposition rate. This indicates that at different \({{\mathscr{R}}}_{c}\), or deposition environment, the deposition rate has an intricate effect on the silver thin films’ optical constants at various λ. These findings underscore the importance of adaptive decision making for each deposition process on the fly.

Fig. 4: Offset plots of predicted absorptance \({\mathscr{A}}\) versus effusion cell temperature T for various \({{\mathscr{R}}}_{c}\) values at specific \({{\mathscr{R}}}_{\lambda }^{{\rm{target}}}\) and its associated λ.
figure 4

ad correspond to different combinations of \({{\mathscr{R}}}_{\lambda }^{{\rm{target}}}\) and its associated λ, which demonstrate various strategies of choosing the optimum T for minimizing \({\mathscr{A}}\).

Such nuanced parameter-property maps would be challenging to discern through traditional methods, particularly when balancing multiple process variables and film characteristics. The models trained with the self-driving setup enable systematic exploration of complex relationships between parameters, showing its potential to uncover subtle dependencies that have eluded purely human-led research41.

Adaptive testing: multi-wavelength \({\mathscr{R}}\) targets

The autonomous PVD system also enables the fabrication of silver thin films satisfying multi-wavelength \({\mathscr{R}}\) targets, effectively specifying a desired spectrum. Depending on the application, one might aim for an \({\mathscr{R}}\) that remains relatively constant across wavelengths to produce a broadband beam splitter, or an \({\mathscr{R}}\) that varies sharply to form a high/low-pass optical filter. As an illustration, we define 2 multi-wavelength targets: \(({{\mathscr{R}}}_{443}^{{\rm{target}}}=0.85,{{\mathscr{R}}}_{781}^{{\rm{target}}}=0.47)\) and \(({{\mathscr{R}}}_{443}^{{\rm{target}}}=0.85,{{\mathscr{R}}}_{781}^{{\rm{target}}}=0.35)\). These choices exemplify efforts to achieve either a shallow or steep slope in \({{\mathscr{R}}}_{\lambda }\) vs λ.

The loss function for a multi-wavelength target is defined as:

$${\mathscr{L}}=\sum _{\lambda \in {\lambda }_{{\rm{targeted}}}}{\left({\mu }_{{\mathscr{R}},\lambda }-{{\mathscr{R}}}_{\lambda }^{{\rm{target}}}\right)}^{2}$$
(6)

As the number of wavelengths specified in the target increases, it is not guaranteed that there exists a set of (T, t), for a given \({{\mathscr{R}}}_{c}\), that produces a film with the desired optical properties. Therefore, when the loss function is minimized and any of the \({\mu }_{{\mathscr{R}},\lambda }\)’s is still > 0.01 away from the target, the algorithm determines the target is unachievable with the given \({{\mathscr{R}}}_{c}\). It aborts the current sample, reports the result as “abort”, and moves on to the next substrate.

After the deposition, it is considered successful if

$$\frac{1}{N}\sum _{\lambda \in {\lambda }_{{\rm{targeted}}}}\left\vert {{\mathscr{R}}}_{\lambda }^{{\rm{measured}}}-{{\mathscr{R}}}_{\lambda }^{{\rm{target}}}\right\vert < 0.025$$
(7)

where N is the number of wavelengths being targeted. If successful, the system moves to the next target. If unsuccessful, the model takes account of the new data and re-attempts the target until success.

Table 2 displays the results using multi-wavelength targets. Our system achieves the two multi-wavelength targets in 6 deposition attempts, while for 4 other samples it decides that the \({{\mathscr{R}}}_{c}\) is unfavorable to achieve the target. Moreover, Fig. 5 displays the bounds of attainable spectra predicted by the model, given the accessible \({{\mathscr{R}}}_{c}\) values during this multi-wavelength testing. The 2 successful experimental spectra effectively explored this space of the spectrum, showcasing the versatility of the system in engineering the film’s optical response.

Fig. 5: Predicted and experimental spectral bounds for silver thin films deposited with \({{\mathscr{R}}}_{443}\) fixed at 0.85.
figure 5

Upper and lower bound curves represent the predicted limits of \({\mathscr{R}}\) across the spectrum (400–850 nm) as determined from model predictions. Experimental spectra for the two multi-wavelength targets closely match the predicted bounds, demonstrating the self-driving PVD's capability to deposit silver thin films with variable spectra.

Furthermore, we investigate the rationale behind the system’s decision to abort certain deposition attempts, specifically for the second set of multi-wavelength target \(({{\mathscr{R}}}_{443}^{{\rm{target}}}=0.85,{{\mathscr{R}}}_{781}^{{\rm{target}}}=0.35)\). The algorithm assesses feasibility based on the measured \({{\mathscr{R}}}_{c}\) of the sample. To evaluate this, we plot the minimum achievable \({{\mathscr{R}}}_{781}\) as a function of \({{\mathscr{R}}}_{c}\) while constraining \({{\mathscr{R}}}_{443}\) = 0.85 (Fig. 6a). The model predicts that for \({{\mathscr{R}}}_{c} < 0.0292\), the target \({{\mathscr{R}}}_{781}=0.35\) is unattainable, leading the system to abort the few initial deposition attempts that yield low \({{\mathscr{R}}}_{c}\) values. Later samples with higher \({{\mathscr{R}}}_{c}\) values show feasibility of meeting the target, prompting the system to proceed with the deposition (Fig. 6b).

Fig. 6: Rationale for on-the-fly decision making on deposition attempts.
figure 6

a Predicted minimum achievable \({{\mathscr{R}}}_{781}\) as a function of the calibration-layer reflectance \({{\mathscr{R}}}_{c}\). The solid curve represents the model-predicted lower bound for \({{\mathscr{R}}}_{781}\) obtained by exploring the deposition parameter space. The annotated intersection marks the minimum \({{\mathscr{R}}}_{c}\) value at which the predicted \({{\mathscr{R}}}_{781}\) meets the target of 0.35. b Experimental results showing the measured \({{\mathscr{R}}}_{c}\) for each sample along with the corresponding decision made by the algorithm. Samples 1, 2, and 4 exhibit calibration values only marginally above the threshold required to achieve the target, causing the optimizer to fail in finding satisfactory deposition parameters on the fly and resulting in the termination of these deposition attempts.

Note that the data points in Fig. 6(a) are generated via brute-force iteration over all deposition parameter combinations, guaranteeing identification of the global minimum of \({{\mathscr{R}}}_{781}\) for each \({{\mathscr{R}}}_{c}\). In contrast, during experiments the algorithm selects deposition parameters on the fly by minimizing the loss function in Equation (6), so it may converge to a local rather than the global minimum. This explains although samples 1, 2, and 4 have \({{\mathscr{R}}}_{c}\) values slightly above the minimum threshold required to achieve the target, the system still decides to abort these samples. While this limitation could be addressed with a more efficient optimization algorithm, the overall decision-making trend of the algorithm remains rational and effective.

This decision-making process illustrates an advanced feature of the self-driving setup. The system not only executes experiments based on user-defined targets but also critically evaluates the feasibility of these targets. By aborting experiments unlikely to succeed, the system optimizes the use of time and resources, thereby establishing a more efficient and intelligent experimental workflow.

Discussion

We have demonstrated a fully self-driving physical vapor deposition (PVD) system that integrates advanced hardware automation, in-situ optical spectroscopy, and Bayesian machine learning to achieve targeted silver thin-film growth. This high-throughput setup enabled the deposition of 38 samples during the training and testing phases and the collection of over 20,000 data points without human intervention. By leveraging active learning and incorporating a calibration layer to account for hidden parameters, our system autonomously navigates a complex parameter space to reliably deposit films with optical properties that closely match user-specified targets.

We choose to work with silver thin films because it represents a simple material system but retains the challenging aspects of thin-film deposition. Although the functionality of the current setup is limited by factors such as the lack of control on substrate temperature, which will be addressed in future studies, the methods we have demonstrated—using pre-training to map the parameter space, active learning to minimize model uncertainty, and adaptive optimization to achieve specific targets–are broadly applicable to a wide range of thin-film deposition tasks.

Moreover, the in-situ optical measurements can be extended to various other characterization techniques, including other spectroscopic methods and diffraction measurements such as spectroscopic ellipsometry, reflective high-energy electron diffraction (RHEED), and low-energy electron diffraction (LEED). The in-situ nature for techniques such as ellipsometry, RHEED and LEED makes them easy to achieve high-throughput data collection and attractive to be integrated with ML16,42,43,44,45,46,47,48. The calibration layer approach, in particular, can be expanded by incorporating additional checkpoints along the deposition trajectory to optimize in a higher-dimensional parameter space. As the number of checkpoints increases, this strategy could eventually lead to a real-time adaptive control framework, where deposition conditions are continuously updated based on immediate feedback. Our work constitutes a key step toward realizing such continuously self-adjusting thin-film growth processes.

Ultimately, our system not only streamlines experimentation, but also exhibits advanced features that pave the way for future self-driving laboratory frameworks. It demonstrates high-dimensional constrained optimization to achieve specified targets, on-the-fly calibration layer strategy that accounts for hidden deposition conditions, and real-time feasibility assessment that intelligently terminates unpromising trials. Together, these capabilities markedly elevate the system’s degree of autonomy and extend the frontier of current self-driving laboratories.

Methods

Self-driving physical vapor deposition system

The self-driving PVD system incorporates a shadow mask beneath a 72-slot sample handling system, ensuring that only one sample is exposed to the deposition source at a time (Fig. 1a). Silver (99.999%, Thermo Fisher) is deposited onto double-side polished BK7 glass (MTI) substrates using an effusion cell (MBE-Komponenten) at a base pressure of <5 × 10−9 mbar and a deposition pressure of 1 × 10−8 mbar.

The reflectivity and absorptivity of the silver thin films are characterized using five p-polarized lasers with wavelengths (λ) of 443, 514, 689, 781, and 817 nm (Coherent StingRay). The lasers are mounted on a linear rail pointing at the substrate with an incident angle of 45 degrees.

Gaussian process regression model

Let \({\bf{X}}\in {{\mathbb{R}}}^{N\times 4}\) denote the matrix of standardized inputs (temperature T, time t, wavelength λ, and calibration layer reflectance \({{\mathscr{R}}}_{c}\)), and \({\bf{y}}\in {{\mathbb{R}}}^{N}\) the vector of \({\mathscr{R}}\) or \({\mathscr{A}}\). A Gaussian-process prior is placed on the latent function \(f:{{\mathbb{R}}}^{4}\to {\mathbb{R}}\):

$$f({\bf{x}})\, \sim \,{\mathcal{GP}}\left(\mu ({\bf{x}}),\,k({\bf{x}},{{\bf{x}}}^{{\prime} })\right),\quad \mu ({\bf{x}})=0,$$

where μ(x) = 0 is the prior mean, and

$$k({\bf{x}},{{\bf{x}}}^{{\prime} })={\sigma }_{f}^{2}\exp \left(-\frac{1}{2}{({\bf{x}}-{{\bf{x}}}^{{\prime} })}^{\top }{\Lambda }^{-1}({\bf{x}}-{{\bf{x}}}^{{\prime} })\right)$$
$$\Lambda ={\rm{diag}}\left({\ell }_{1}^{2},\ldots ,{\ell }_{4}^{2}\right)$$

Here fi = f(xi) denotes the latent function value at xi, \({\sigma }_{f}^{2}\) is the signal variance determining Var[f(x)] under the prior, j is the length-scale along the jth input dimension.

Observations are assumed noisy:

$${y}_{i}={f}_{i}+{\varepsilon }_{i},\quad {\varepsilon }_{i} \sim {\mathcal{N}}(0,{\sigma }_{n}^{2}),$$

so that

$$p({y}_{i}| {f}_{i})={\mathcal{N}}\left({y}_{i}| {f}_{i},\,{\sigma }_{n}^{2}\right),$$

where σn is the noise standard deviation.

Hyperparameters \(\theta =\{{\sigma }_{f}^{2},{\ell }_{1},{\ell }_{2},{\ell }_{3},{\ell }_{4},{\sigma }_{n}^{2}\}\) are learned by minimizing the negative log-marginal likelihood

$${\mathcal{L}}(\theta )=-\log p({\bf{y}}| {\bf{X}},\theta ),$$

using the Adam optimizer with learning rate of 0.1. Each optimizer step over all N points constitutes one epoch.