Introduction

Many modern technological challenges crucially depend on the properties of surfaces and interfaces. This includes the control of charge and energy transfer across electrode/electrolyte interfaces in batteries1 and fuel cells2, the characterization of structure and dynamics of wear and lubrication at tribological interfaces3, the optimization of chemical transformations at metal surfaces in heterogeneous catalysis4, corrosion science5 and surface functionalization6. Surface/thin film growth processes such as atomic layer deposition, chemical vapor deposition, or molecular beam epitaxy are highly industrially relevant. Increasing demand for functional thin-films and surface nanostructures also increases their structural complexity. As most modern materials involve multiple components, understanding the structure and properties of thin films, composites, buried interfaces, and exposed surfaces is now more important than ever. The atomic-scale characterization and design of functional interfaces require understanding and manipulation at the nanoscale, which often cannot be delivered by experimentation alone.

Computational simulation of surface and interface processes has become central to modern surface science. Few fields rely as strongly on the synergy between atomistic simulation and experimental study. This synergy is achieved by minimizing the gap between experimental complexity and simulation models, often through the use of model surfaces and ultra-high vacuum conditions, which, for example, enable atomic resolution to be achieved in scanning probe microscopy. However, surfaces in real-world applications are often more complex, featuring defects and partial disorder. Additionally, ambient pressure and interacting molecules play crucial roles in many applications, such as catalysis. By advancing the study of complex surface systems and dynamic processes at large length- and time scales with high throughput, machine learning (ML) and data-driven approaches have the potential to bring atomistic simulation and experiment even closer, offering improved mechanistic understanding of surface dynamics, reaction pathways, growth processes, and mechanical and electronic properties.

Common ML methods used in surface science encompass neural networks (NN), Bayesian regression methods, decision trees, support vector machines, and genetic algorithms. Citations for specific uses are given in later sections. These methods can learn expressions for formation energies, potential energy surfaces (PES), and other properties, provide frameworks to efficiently explore the configuration space of the material, or facilitate the optimization of a target property. MLIPs, in particular, are highly impactful and have revolutionized the simulation of bulk materials and are in the process of taking over chemical and biomolecular simulations7,8. MLIPs are omnipresent in surface science and will be discussed throughout this review. Considering the broad range of ML applications in surface science, we do not dedicate a separate chapter to MLIPs. Instead, we discuss them as part of the section on Accurate dynamics at large time- and length scales, as MLIPs are most commonly used to accelerate dynamics simulations.

Motivation of the review

A number of comprehensive reviews have been published on ML from different perspectives such as heterogeneous catalysis9,10,11,12 and experimental surface science13,14. This review targets computational surface scientists seeking to integrate ML techniques, providing an overview of current capabilities and limitations. Surfaces and interfaces present a unique challenge due to complex processes such as charge transfer and bond formation as well as competing interactions such as covalent, electrostatic, dispersion forces, and the coexistence of localized and delocalized electronic states in large unit cells with hundreds or thousands of atoms15. Examples such as the CO on metals puzzle16,17 demonstrate that semi-local Density Functional Theory (DFT) often falls short in accurately describing surfaces and interfaces.

Current challenges in surface science (sketched in Fig. 1), such as the modeling of two-dimensional materials, electronic interface engineering, light-matter interaction at interfaces, description of realistic catalysts, incommensurate surface nanostructures, surface electrochemistry, provide specific demands on computational modeling capabilities, which machine learning and data-driven methods can help to address.

Fig. 1: Schematic depiction of exemplary grand challenge areas in surface science.
figure 1

From left to right: two-dimensional materials, electronic interface engineering, light-matter interaction at interfaces, description of realistic catalysts, incommensurate surface nanostructures, surface electrochemistry; Frontier modeling requirements arise from these application challenges, which are listed in the bottom panel.

In this review, we focus on these broader computational simulation challenges spanning gas-surface, solid-liquid, and solid-solid interfaces. While each system presents unique challenges, many computational workflows are applicable across all types of surfaces and interfaces. We explore how new ML-enabled workflows are transforming these surface science applications, focusing on their role in (A) structure prediction of realistic surfaces, (B) thermodynamic stability and compositional phase space under realistic conditions, (C) large-scale electronic property prediction, (D) barrier prediction, (E) multiscale surface kinetics, (F) accurate dynamics at large time- and length scales, and (G) excited states, surface spectroscopy and nonadiabatic dynamics. We address the unique challenges of studying lower-dimensional systems compared to bulk or molecular systems and identify gaps in ML tools that, if filled, could greatly enhance computational surface science.

Structure prediction of surfaces and interfaces

Understanding the structure of an interface, surface, or adsorbed layer is crucial for studying properties such as charge transfer, surface states, level alignment, or the interface dipole. Surface structure prediction is challenging due to the vast number of possible structures and the computational expense of large unit cells containing hundreds of atoms. Advanced optimization methods and ML approaches have enabled effective local and global structure optimization and efficient sampling of the PES (see Fig. 2).

Fig. 2: Schematic representation of structure search tasks.
figure 2

A For local and global structure optimization, only part of the PES is sampled. B The phase space can be mapped by determining the entire PES.

Local structure optimization

Today, local structure optimization in surface science is considered a mostly solved issue. Common local optimization routines such as the gradient descent, Broyden-Fletcher-Goldfarb-Shanno, or Nelder-Mead methods can often be used in conjunction with first-principles computations. For particularly demanding problems, surrogate models can be employed: For example, using a Bayesian ML approach, Garijo del Río et al.18,19 accelerated local geometry optimizations of CO on Ag(111) and C on Cu(100). Their algorithm, called GPMin, conducts optimizations on a machine-learned PES and improves the underlying ML-model by adding each newly found local minimum to the training data. GPMin is available in the atomic simulation environment (ASE)20.

Global structure optimization

Unlike local optimization, global optimization is significantly more challenging due to the vastly larger search space. The uniquely complex interactions at surfaces and interfaces (charge transfer, hybridization, level alignment, etc.) render purely first-principles approaches intractable15,21. ML based approaches, such as Bayesian regression, tree-based algorithms, genetic algorithms (GAs), and NNs have enabled significant advances in tackling complex systems with many degrees of freedom.

For instance, the group of Bjørk Hammer developed GOFEE (global optimization with first-principles energy expressions)22, which employs a Gaussian process regression (GPR) based surrogate model for energies. GOFEE makes efficient use of limited first principles data through adaptive sampling (see Accurate dynamics at large time- and length scales) and was applied to study the oxidation and oxygen intercalation of graphene on Ir(111). Similarly, Kaappa et al. used Gaussian processes to create a surrogate model for the PES to enable global optimization. Their algorithms, BEACON (Bayesian Exploration of Atomic Configurations for Optimization)23 and ICE-BEACON24, are available in the GPAtom package. Another approach successfully used for global optimization is Gaussian approximation potential (GAP)25. For example, Timmermann et al. applied GAP with simulated annealing to optimize low-index surfaces of rutile (IrO2)26.

GPR is also the foundation of the BOSS code27. BOSS combines GPR with descriptors based on Cartesian and internal coordinates, using model uncertainty for active learning (see Accurate dynamics at large time- and length scales). BOSS requires only a few hundred energy evaluations to construct a five-dimensional PES, though the number of necessary training data points increases exponentially with the degrees of freedom. A similar approach was developed by Hörmann et al., using radial distance functions as descriptors to predict the PES of single molecules on surfaces28. We note that the SAMPLE21,29 and GAMMA30 approaches, discussed in the next chapter, facilitate structure prediction of molecular adlayers on surfaces.

A different approach was used by Li et al.31 to create a transferable ML model for the prediction of adsorption energies of single-atom alloys. They utilized XGBoost32, which uses a boosted tree algorithm based on gradient boosting. Hereby a series of trees is used: The first tree learns the original data, while every subsequent tree learns the residual of the previous tree, thereby improving the accuracy. XGBoost is widely used in computational materials research and has, for instance, been employed to model molecular adsorption in metal-organic framework33 or for the discovery of heterogeneous catalysts34.

GAs have been widely applied for global optimization of surface structures. While many of these algorithms were originally developed to find minimum energy structures for bulk materials or proteins, they have since been adapted for surfaces. A prime example is USPEX (Universal Structure Predictor Evolutionary Xtallography)35 which uses a GA to computationally predict crystal structures. Wen et al. used the USPEX code to determine the structure of a mixed-metal oxide monolayer grown at an oxide support36. A key challenge in performing GA optimizations with first-principles calculations is the high computational cost of surface calculations, prompting efforts to minimize the required computations. Chuang et al. developed a GA that avoids the evaluation of duplicate structures by ensuring that the structures in the pool either differ in energy or atomic displacements, allowing them to determine the reconstructions of semiconductor surfaces37. Computational effort can also be reduced by coupling first-principles calculations with less resource-intensive methods. For example, Bjørk Hammer’s group implemented a two-stage DFT optimization approach, which pre-screens candidate structures with less accurate, lower-cost methods to identify and remove duplicates. Promising new structures are then refined using more accurate DFT calculations38,39. Another efficient strategy is to pair a GA with a surrogate model for the target property. Jacobsen et al. used a kernel ridge regression ML model with adaptive sampling to guide a GA for optimizing surface reconstructions of oxide materials40, leveraging feature vectors from Oganov et al.41 with adaptive sampling (see Accurate dynamics at large time- and length scales) to generate training data.

Particle swarm optimization has also been applied to predict surface structures. A notable example is the CALYPSO (Crystal structure AnaLYsis by Particle Swarm Optimization) code42. Using CALYPSO, Lu et al. developed a method to explore the surface structures such as diamond surface reconstructions featuring self-assembled carbon nanotube arrays43. The code has also been employed to predict solid-solid interface structures. Optimal lattice-matched superlattices are determined, employing constraints on interatomic distances and atomic coordination numbers to generate starting interface structures44. Moreover, CALYPSO was used to reveal the CeO2 surface reconstruction. First-generation structures were created with CALYPSO, based on input parameters such as the bulk crystal surface structure, number of atoms that form the interface, and thickness of reconstructed layers45. DFT was used to optimize these structures compared to experiment.

While most ML applications in surface science use supervised learning methods, reinforcement learning methods are also gaining traction in the structure prediction community. Meldgaard et al. predicted crystal surface reconstructions by using reinforcement learning combined with image recognition46. The learning agent employs a deep NN to decide the placement and type of the next atom in an incomplete structure. The new structure—which represents a new state—is evaluated using DFT and a reward is determined. States and rewards are saved and used by the agent for future atom placements. This algorithm is an extension (to 3D structures) of the ASLA method47 which generated 2D materials and molecules with reinforcement learning.

Open challenges

Despite major advances in accurately predicting global minimum surface structures, current methods are limited to systems with a few hundred atoms27,48. Large-scale systems, such as incommensurate surface structures or realistic catalytic surfaces, containing tens of thousands of atoms remain beyond reach due to the limited efficiency and precision of current inference methods.

Surfaces regularly exhibit multiple (even thousands) polymorphs with differences in stability below 1 kcal/mol per atom (40 meV)21,49, despite exhibiting different properties, such as the work function. DFT is the workhorse method in the field, but different functionals, especially the typically used Generalized Gradient Approximation (GGA), struggle to yield energy computations that are consistently within an accuracy of 1 kcal/mol per atom28,50,51. While more accurate first-principles methods exist, their cost conflicts with the demand for large datasets to improve ML model accuracy. More data-efficient approaches are required. Several examples of transfer learning exist for molecules52,53 and bulk materials54,55. Recently, foundation models (also called universal MLIPs) have been introduced, that are trained on large and diverse databases and often deliver adequate accuracy for tasks such as predicting formation energies and performing preliminary geometry optimizations.

Nevertheless, gaps in accuracy and amount of available training data, as well as limited inference efficiency, remain open challenges in the prediction of realistic surface structures.

Thermodynamic stability and compositional phase space under realistic conditions

Understanding thermodynamic stability and compositional phase space enables the prediction of surface structures under realistic conditions, essential for targeted materials design and experimental interpretation. Realistic conditions, representative of common experimental setups and practical applications, typically encompass temperatures around room temperature, pressures spanning from ultra-high vacuum to atmospheric levels, and chemical potentials reflecting the specific surrounding gaseous or liquid environment.

Hörmann et al. developed a surface structure search algorithm called SAMPLE21,29, which enables the prediction of compositional and thermodynamic phase diagrams. SAMPLE can learn energies and work functions56 and generate a comprehensive list of surface structures based on coverage and the number of molecules per unit cell. It used a pair-wise potential fitted using Bayesian linear regression from a training set of a few hundred DFT evaluations. The training set is chosen using optimal design theory to maximize the information gained, addressing a key challenge in ML for surface science–efficient data use. Combined with ab-initio thermodynamics57, SAMPLE can provide access to thermodynamic quantities via the partition function. The authors later extended this method to predicting phase diagrams of near-incommensurate surface structures49,58. A similar approach to SAMPLE is the GAMMA (Generalized block AsseMbly Machine learning equivalence clAss sampling modeling) approach30. It employs an Ising-type model based on energy calculations of molecule-molecule and molecule-substrate interactions performed in isolation. This approach contrasts with SAMPLE, which derives these interactions from formation energy calculations of the adsorbed system. GAMMA explores the configuration space stochastically using an equivalence class sampling algorithm that removes redundant information, allowing it to consider large unit cells and explore how interaction strength and temperature affect the self-assembly processes of molecules on substrates.

A different approach was taken by Ulissi et al., who used GPR to learn the coverage-dependent free energy for IrO2 and MoS2 surface reconstructions59. Training on a small number of DFT calculations, selected using the inherent uncertainty of GPR, allows the authors to construct Pourbaix diagrams that illustrate the stability of surfaces in electrochemistry applications as a function of pH and electrochemical potential.

ML has also facilitated research into the phase behavior of solid-solid interfaces. Moayedpour et al., for instance, studied epitaxial interfaces of tetracyanoquinodimethane on tetrathiafulvalene60. The authors used an improved version of the Ogre61 code to generate possible surface structures and predicted energies using the ANI62 MLIP in conjunction with the D363 vdW-correction. The Ogre code facilitates generating ideal (perfectly ordered) interfaces, without considering growth conditions. Huber et al. have used gradient-boosted decision trees, based on descriptors that depend only on local properties of the grain boundary, to predict the segregation energy distribution, and thus the segregation isotherm of grain boundaries of metallic solutes in aluminum64. Their model provides improved predictive power over the common Langmuir-McLean model.

A data-driven, but not directly ML related approach for determining surface thermodynamic properties is nested sampling65. Yang et al. used this method to calculate adsorbate phase diagrams, incorporating all relevant configurational contributions to the free energy66. Nested sampling requires significant numbers of energy evaluations, the cost of which was overcome by the authors by using a Lennard-Jones potential to describe interactions.

Ulissi et al. used GPR to learn the coverage-dependent free energy for IrO2 and MoS2 surface reconstructions59. Training on a small number of DFT calculations, selected using the inherent uncertainty of GPR, allows the authors to construct Pourbaix diagrams that illustrate the stability of surfaces in electrochemistry applications as a function of pH and electrochemical potential.

Open challenges

Exploring thermodynamic stability and compositional phase space presents similar challenges to predicting surface structures (see Structure prediction of surfaces and interfaces). Studying thermodynamic stability presents the additional challenge of learning properties such as the vibrational enthalpies and free energies, which require accurate modeling of anharmonic effects, electronic effects, and environmental conditions. Most ML applications in surface thermodynamics focus on predicting energies of phase candidates. While progress has been made in learning thermodynamic properties for molecular or bulk systems67,68, as well as ML-enhanced CALPHAD (Calculation of Phase Diagrams) modeling for bulk materials69,70, little work exists on directly learning surface vibrational enthalpies or other thermodynamic properties.

Comprehensive, accurate databases for vibrational eigenvalues and eigenvectors, phase transitions, or other thermodynamic properties of surface structures are lacking. While thermodynamic properties such as the adsorption energies or vibrational properties can be found in common databases (see Table 3), the amount of data that exists on surfaces is limited. For instance, the recently published Molecular Vibration Explorer71 in the materialscloud72 is only dedicated to a specific class of molecules. As a result of this data gap, little work exists on benchmarking ML approaches for thermodynamic properties.

Large-scale electronic property and spectroscopy prediction

Geometric features of surfaces and interfaces, such as defects, reconstructions, incommensurate structures, step edges, and grain boundaries, affect the electronic structure, thereby contributing to measurable electronic transport, reactivity, and spectroscopic signatures. Accurately capturing these effects in simulations requires the prediction of electronic properties in large unit cells containing thousands to tens of thousands of atoms. ML surrogate models of electronic structure and spectroscopic properties are much less developed than ML models for energy and force prediction. To date, only few proof-of-principle models have been applied in the context of surface science. Their obvious applications lie in high-throughput property prediction and in overcoming the intrinsic scaling limitations of DFT.

Learning level alignment, interfacial orbital hybridization, band bending

Scalar electronic properties can often be learned with methods originally designed to learn energies. For instance, the SAMPLE code21 is able to predict the work function of large molecular superstructures on surfaces. Choudhary et al. followed a similar philosophy when developing InterMat73. This approach uses DFT and NNs to predict band offsets of semiconductor interfaces. A graph NN model is used to predict the valence band maxima and conduction band minima of the respective surfaces, using the atomic structures as input data. Training data was generated with DFT. By aligning the vacuum levels of these surfaces, the band offset is determined (according to Anderson’s rule).

Gerber et al. used a purely data-driven approach for InterMatch74, a high-throughput computational framework designed to efficiently predict charge transfer, strain, and superlattice structures at material interfaces. The algorithm leverages databases of individual bulk materials to streamline the prediction process. By accessing lattice vectors, density of states, and stiffness tensors from the Materials Project, InterMatch estimates interfacial properties. The code was used to study transition metal dichalcogenides and qualitatively reproduce DFT simulations and experiment. This approach enables high-throughput pre-screening.

Hamiltonian and band structure prediction

Nonlinear learning techniques to parameterize Hamiltonians have a long legacy in semi-empirical and tight-binding models, and several end-to-end learning strategies of Slater-Koster integral tables from calculated properties have been proposed for molecules and materials75,76,77. Beyond the limitations of the 2-center integral approximations, various linear, kernel-based78, and deep-learning-based representations of electronic structure have been proposed for molecules79,80. These models typically target the Kohn-Sham Hamiltonian in local orbital representation, which provides an atom-centered representation that can encode the rotational transformation properties of local orbital integrals81 and does not suffer from the non-smoothness and phase issues associated with learning eigenvalues and eigenvectors directly82. Additionally, this representation can also be extended to condensed phase systems as the Bloch transform of the real-space Hamiltonian readily enables the calculation of the band structure and electronic density of state (DOS). Examples of recent models include the linear ACEhamiltonians model based on the Atomic Cluster Expansion83, DeepH84, and DeepH-E3 models85. The latter has been applied to two-dimensional materials such as twisted graphene bilayers.

Prediction of electronic density of states, the electron density, and spectroscopic properties

The ability to predict the electronic DOS or the electron density directly (rather than through diagonalization of a Hamiltonian) is valuable in studying charge transfer and the relationship between surface structure and electronic properties. SALTED is an approach to learning the electron density expressed as a linear combination of atom-centered basis functions. The corresponding basis coefficients are trained using Gaussian process regression86. This approach has been applied to condensed phase systems, two-dimensional materials, and electrochemical interfaces87. In the latter case, it is able to accurately predict the charge distribution of a metal electrode in contact with an explicit aqueous electrolyte. Several approaches have been proposed that represent the DOS in condensed matter systems either by directly learning a spectrum or by representation in a suitable atom-centered basis88,89,90. This approach has been successfully employed for Al metal slabs and is readily applicable in surface science applications.

Related to learning the DOS and electron density are ML models for spectroscopic properties. Methods such as ultraviolet photoelectron spectroscopy (UPS), x-ray photoelectron spectroscopy (XPS), and near-edge x-ray absorption fine structure (NEXAFS) provide invaluable information on the chemical state and bonding environment of atoms in samples. Often, spectroscopic assignment is challenging, and first-principles simulations can provide valuable input. ML methods provide the means to accelerate predictions, with kernel ridge regression and neural network models having been proposed to predict (inverse) photoemission signatures82, XPS91,92, and NEXAFS93,94, but have mostly focused on bulk systems. Additionally, ML approaches enable novel discoveries, such as inverse structure determination based on target spectra95. ML methods have seen widespread application in spectroscopy, which has been summarized in comprehensive reviews on the subject82,96,97,98.

Open challenges

A critical open challenge in electronic structure and spectroscopy surrogate models and their applications to surface science is the integration with existing tools and workflows. Most current models are focused on bulk systems, but would easily be transferable to surface science. However, there exists a lack of sufficient training data. So far, little work has gone into making proof-of-principle models user-friendly, scalable, and well-integrated with electronic structure and simulation software. General future directions have been previously articulated99 and integration with dynamics will be covered in Excited states and nonadiabatic dynamics. Specifically in the context of surface science, these models will face similar challenges as first principles and semi-empirical methods as interfaces offer a rich diversity in local atom and bond environments that is hard to capture15. In addition, long-range electrostatic and dispersion interaction contributions to the electronic properties must be considered. We believe that the developments in this field will drastically pick up in the coming years with many ML developments, such as equivariant NNs, straightforwardly transferring into electronic structure surrogates.

Reaction barrier prediction

Most experiments and industrial applications (e.g., surface deposition processes, catalysis) operate away from thermodynamic equilibrium, making it essential to consider kinetic processes at surfaces. This involves identifying metastable states, barriers, transition states, and transition rates at surfaces. The extensive energy and force evaluations needed for such studies demand highly efficient ML approaches.

Transition state search

Transition state search is essential to the study of surface dynamics. The transition energy, for instance, defines reaction rates. Although many successful algorithms were developed for transition path and transition state search, like e.g., nudged elastic bands (NEB), there is still an urgent need for automation and speed-up. Multiple novel methods25,100,101,102,103,104,105,106,107 allow constructing MLIPs based on first principles data. Such MLIPs can replace more computationally expensive DFT calculations within the NEB algorithm. In most cases, training MLIPs during the evaluation of NEBs is as computationally costly as directly evaluating the NEBs using DFT codes. However, the resulting models can be later used for other purposes, e.g., for kinetics. One of the earliest examples of ML accelerated transition state search was introduced by Peterson108, who created an approximate NN-based PES using the atomistic ML-based package (Amp)106 to speed up NEB calculations. After finding an initial saddle point, results are confirmed with first-principles calculations, and the model is retrained with the new data points. This is repeated until agreement with the first-principles calculations is reached. The method was employed for two systems: diffusion of a gold atom on an Al(100) surface infused with Pt atoms and for bond rotation in ethane, in both cases requiring significantly fewer force calls than standard DFT-based NEB. Schaaf et al. presented a general protocol for the prediction of energy barriers of catalytic systems by training MLIPs using active learning based on the energy uncertainty of individual atoms. The protocol was applied to the conversion of CO2 to methanol at an In2O3 surface with a single oxygen vacancy109.

A recent effort toward large pre-trained MLIPs capable of predicting reaction barriers across diverse catalytic systems was enabled by the OC20NEB database110, containing 932 NEB calculations at the GGA-DFT level. Wander et al. used this database within the CatTSunami framework to validate NN-based MLIP models trained on the OC20 database111 (see Accurate dynamics at large time- and length scales), based on 153M and 31M parameter Equiformer v2112, GemNet-OC113, PaiNN114, and DimeNet++115 models110. The results of this study are reproduced in Table 1, which shows convergence and success rate of predicting barriers of molecules detaching from the surface (desorption), dissociating at a surface (dissociation), or atom exchange between two reactants (transfer). Equiformer v2 shows the best performance, even using the lighter (31M) model. PaiNN and DimeNet++ underperformed in particular for the dissociation and transfer reactions, wherein more degrees of freedom participate in the reaction than for desorption reactions.

Table 1 Performance of Equiformer v2 (Eq2), GemNet-OC, PaiNN, and DimeNet++, trained on OC20 database, in predicting reaction (desorption, dissociation or transfer) barriers, tested on validation set based on OC20NEB

To reduce the number of energy and force evaluations required during transition state search, Koistinen, Jónsson, and co-workers introduced GPR-aided NEB, which evaluates only the geometry from the highest uncertainty image of the predicted minimum energy path. The authors tested the algorithm on two cases: a 2D problem and a heptamer island on a (111) surface using Morse potentials116,117. An improved GPR-based NEB (ML-NEB) was introduced by Garrido Torres and co-workers118. Their algorithm adjusts the entire minimum energy path after every force evaluation, minimizing the number of evaluations needed to converge. The method was tested on Au diffusion on Al(111), Pt adatom diffusion on a stepped Pt surface, and a Pt heptamer island on Pt(111), demonstrating a remarkable reduction in force evaluations compared to established optimization algorithms. For the two-dimensional Müller-Brown potential, the search for the minimum energy path (Fmax = 0.05 eV/Å) using the climbing image NEB requires 286 evaluations (11 images), whereas with ML-NEB, it only requires 16 evaluations (Fig. 3). Despite the success of the ML-NEB method in reducing the number of force evaluations needed to find the reaction barriers, it scales cubically with the number of atoms, thus it struggles for higher dimensional problems in which the model retraining step becomes a bottleneck that significantly increases the final evaluation time of NEB calculation.

Fig. 3: Conventional and ML accelerated minimum energy path search.
figure 3

Minimum energy paths evaluated with climbing image NEB (above) and ML-NEB (below) based on the results from ref. 118 for the two-dimensional Müller-Brown PES. Black crosses correspond to all structures that were evaluated in the optimization process, and white crosses represent the final path.

Direct activation energy prediction

Apart from deriving activation energies and rates from transition states, both could in principle also be directly predicted without a prior search for the transition state structure. A ML based direct prediction of activation energies in complex catalytic systems was proposed by Singh et al.119. They employed a forward-search algorithm to select both linear and non-linear features and compared various ML techniques (linear regression, Gaussian process, random forest) for their effectiveness in predicting reaction rates. Focusing on the dehydrogenation and dissociation of N2 and O2 on surfaces, a polynomial-feature-based linear regression model performed the best, leading to the accuracy improvements of roughly 2 times over previous, single-parameter linear-regression-based models. Later, Komp et al. employed a more complex ML model based on deep NN to predict the full quantum reaction rate constant for one-dimensional reactive pathways using roughly 1.5 million training data points. The authors tested the model on the diffusion of H on Ni(100) and other non-surface reactions120.

Complementary experimental data can significantly enhance the predictions of kinetic properties. For example, Smith et al. proposed using experimental descriptor data combined with dimensionality reduction, principal component analysis, and NN to predict reaction rates of the water-gas shift reaction using different catalysts and reaction conditions121.

Open challenges

The prediction of reaction barriers has reached a notable advancement with ML techniques. However, an accurate and efficient direct reaction barrier prediction for molecules, materials, and surface chemistry has so far not been achieved. Promising methods, e.g., ML-NEB118, have shown success in predicting reaction barriers of lower dimensional systems; however, more developments have to be made to provide such level of improvements robustly and consistently for higher-dimensional systems. NEB calculations typically demand substantial human intervention. Further development is necessary to create automated workflows that can efficiently identify barriers and transition states. Moreover, few comprehensive data sets, the OC20NEB database110 being an exception, exist on barriers and transition paths which would facilitate development and testing of current and new methods.

Multiscale kinetic simulations

Kinetic simulations are essential for understanding surface reaction mechanisms and growth. Fundamental challenges in kinetic simulations include capturing processes that occur over vastly different time and length scales, accounting for a large chemical reaction space, and quantifying uncertainties and understanding how they propagate across scales. ML can enhance large-scale kinetic simulations by integrating with mean-field microkinetic models, kinetic Monte Carlo (kMC) methods, and reaction networks. While microkinetic models are more computationally efficient, kMC simulations fully capture the reactive site dependence and fluctuations, and reaction networks offer a framework for organizing complex reaction pathways and tracking the evolution of species over time122.

Mean-field microkinetic modeling

By assuming a uniform distribution of reactants and intermediates on the surface, mean-field microkinetic modeling simplifies the treatment of coverage-dependent effects and complex reaction networks. A mean-field model is the basis of the RMG-CAT software developed by Goldsmith and West123. RMG-CAT employs graph theory supported by least-squares regression to simulate microkinetic mechanisms for heterogeneous catalysis and has been successfully applied to model the dry reforming of methane on a Ni surface. Tian and Rangarajan introduced a mean-field microkinetic approach that utilizes NNs (NN-MK). In their approach, rates of elementary steps in the fast diffusion limit are generated with the lattice MC simulations and then passed into an NN-based model that maps coverages with reaction rate for the entire reaction network. This model was then used in the mean-field microkinetic model. They demonstrated the capabilities of the model by studying CO oxidation, reaching accuracy comparable to kMC simulations124,125.

Kinetic Monte-Carlo simulations

A more detailed approach to modeling kinetic processes is kMC, which enables capturing local fluctuations and spatial correlations. An early example of ML-based kMC is the approach developed by Sastry et al.126. They used genetic programming to construct a symbolic regression for reaction barriers (given a small set of calculated barriers) to enable kMC simulations of the vacancy-assisted migration on an fcc CuxCo1-x surface. Djurabekova et al. developed a simple NN coupled with kMC to efficiently predict vacancy migration energies, reducing computational costs and enabling the exploration of kinetic pathways during Cu precipitation in Fe127. Castin et al.128 developed a similar approach of employing a NN model to predict vacancy migration energies based on NEB data, allowing the acceleration of kMC simulations of thermal annealing in Fe-20%Cr alloys, and later for precipitation studies in Fe-based alloys129,130 and point-defect transition rates in FeCu alloys131. Castin et al. further extended this method with high-dimensional NN MLIPs to predict vacancy migration barriers in kMC, tested on FeCu and FeCr alloys132.

Apart from using ML-methods to predict reaction rates and to accelerate kMC simulations, it has also been proposed to entirely replace kMC with an ML-based model. Chaffart et al. utilized NNs to predict coefficients of stochastic partial differential equations as a function of substrate temperature and surface precursor fraction. The authors then combined the resulting method with continuum transport equations to predict epitaxial thin film evolution and growth of a gaseous molecular precursor133. Building on this approach, Kimaev et al. introduced a NN that can completely replace the stochastic multiscale model, which included coupling kMC with partial differential equations to simulate thin film formation by chemical vapor deposition134,135,136. Independently, Ding et al. developed an integrated multiscale recurrent NN-based model for gas-phase transport profiles and microscopic surface dynamics using kMC and validated it for the plasma-enhanced atomic layer deposition of HfO2 thin films137. The improved efficiency of MLIPs has recently enabled direct modeling of some kinetic processes. For instance, Zhou et al. employed a GPR-based active learning tool for constructing MLIPs, named Flare107, to study reconstruction kinetics at the PdAu(111) surface induced by CO138. To efficiently learn the relation between reaction barriers and the output of the kMC model, Soleymanibrojeni et al. introduced an active learning process coupled with kMC to model solid-electrolyte interphase formation in Li-ion batteries, in which the initial dataset is constructed using a design-of-experiment approach, and a Gaussian process classification model139.

Reaction networks

ML models have shown great potential in uncovering the complexity of reaction networks that limit the modeling of many processes. Ulissi et al. presented a workflow that enables predicting the complex reaction pathway, employing GPR, which was applied to the syngas conversion on Rh(111)140. Another approach was shown by Liu and co-workers who employed stochastic surface walking to construct global NN-based PESs (SSW-NN)141 to study the water gas shift reaction on the Cu(111) surface. In the study, DFT data were used, containing 375,000 minima and more than 10,000 reaction pairs. The model enabled authors to find the lowest energy pathway for the entire reaction network142. This approach was later extended to an end-to-end framework for the activity prediction of heterogeneous catalytic systems (AI-Cat), in which two NN models are used, the first one for predicting possible reaction patterns and the second one for predicting the reaction barrier and energy. The NN models are employed in a Monte Carlo tree search to find the low-energy pathways of the reaction network. The authors applied AI-Cat to study the selectivity of glycerol hydrogenolysis on Cu surfaces143.

Open challenges

Key challenges in kMC and microkinetic modeling are to ensure that the chemical reaction space (sites, elementary processes) adequately represents the system and that the underlying rate constants are valid for all relevant regimes. Rapid developments in data-driven approaches and global reaction exploration models are underway that will benefit the former123,144. Robust and transferable ML surrogates that can rapidly predict rate constants as a function of various conditions will dramatically benefit the latter.

The propagation of kMC and microkinetic models featuring processes with vastly different time scales is a challenge that continues to be an active research area. Approaches that aim to replace kMC with machine learning ML-based models have shown promise. Additionally, sophisticated variable-coefficient differential equation solvers as well as robust uncertainty quantification and sensitivity analysis, will be crucial for advancing this field145.

Kinetic models not only couple to first principles data via input parameters, they also provide information for mass transport modeling of macroscopic surface models. ML models will not only support parameterization but also uncertainty quantification across different scales.

Accurate dynamics at large time- and length scales

Simulating surface dynamics is challenging due to the high dimensionality, complex electronic structure of metallic surfaces, and the frequent requirement for ensemble averaging over thousands of molecular dynamics (MD) trajectories. Accurate modeling of dynamics at surfaces requires models that match the accuracy of first principles methods. However, employing ab initio MD for experimentally relevant systems is unfeasible in most cases, e.g., due to the long time scales and large system sizes required, leading to high computational costs of ab initio methods. This challenge has driven the development of MLIPs—ML interatomic potentials trained to first principles accuracy.

Interatomic potentials

Simulations of dynamics on surfaces require highly efficient surrogate models for energies, forces, and other properties. Fundamental to this are MLIPs. We note that many MLIPs are not solely dedicated to surface structure prediction and that their construction has recently been reviewed extensively in other areas of computational materials modeling7,8,146,147,148. Most existing methods to construct MLIPs are based on NN-potentials or Bayesian regression methods. NN-potentials are unbiased and sufficiently flexible to learn from electronic structure data. They typically require in the range of 104 training data points for high-dimensional PES with tens to hundreds of degrees of freedom. Inference, i.e., evaluation of MLIPs typically requires significantly greater computational effort compared to empirical force field potentials. Beos et al. highlighted the advantages and disadvantages of NNs over force fields by comparing a Behler-Parinello net to the ReaxFF149 force field150. While the NN performed significantly better for bulk properties than ReaxFF, the NN required significantly more (approximately ten times as much) training data compared to ReaxFF to obtain an accurate model for surface structures and adatom diffusion barriers. Bayesian regression methods are typically less data-hungry, but provide less flexibility and transferability21,25,27.

Reactive gas-surface scattering simulations have long motivated the development of interatomic potentials. Early reactive PES construction methods include the corrugation reducing procedure151, modified Shepard interpolation152,153, or interpolation with permutation invariant polynomials154,155,156. These methods enable low computational costs of MD simulations, however, they require using a frozen (static) surface approximation, excluding the explicit treatment of surface atom motion. Most recent dynamical simulations employ MLIPs, which deliver ab initio accuracy and high efficiency while allowing the inclusion of all degrees of freedom in simulations. Therefore, MLIPs have come to dominate surface dynamics research. Early MLIPs for surface dynamics date back to 1995, when Blank et al. employed simple NNs to construct PESs for CO adsorbed on a Ni(111) surface and H2 on a Si(100) surface, utilizing the latter for quantum transition state theory rate calculations157. A highly successful MLIP based on Bayesian regression is GAP25. GAP allows interpolating the PES by using GPR with a descriptor based on the local atomic density. GAP was originally applied to bulk crystals and was only later applied to problems in surface science26,158,159. A milestone in the development of MLIPs was the high-dimensional, atom-centered NNs introduced by Behler and Parrinello. Their method calculates the total energy by summing atomic energy contributions, with interatomic interactions described within a defined cutoff radius160. This atom-centered energy decomposition with finite cutoffs has since become a standard approach in most contemporary MLIPs.

$$E=\mathop{\sum }\limits_{i=1}^{{N}_{{\rm{at}}}}{E}_{i}=\mathop{\sum }\limits_{i=1}^{{N}_{{\rm{at}}}}N{N}_{i}$$
(1)

A key challenge underpinning interatomic potentials is the selection of good atom-centered structural descriptors (or features). A descriptor is a mathematical representation of the local atomic environment, encoding structural and chemical information to enable a smooth ML representation of energies, forces, and other properties as a function of the atomic positions and species. To be physically meaningful, a descriptor must remain invariant under the symmetry transformations of the system, such as rigid rotations, translations, and permutations. Correctly accounting for symmetries is crucial in MLIPs, especially for surfaces, which often exhibit multiple symmetries. Behler et al. were also among the first to apply NNs in surface science. They used general symmetry functions based on atomic Fourier terms, replacing the inconvenient empirical functions presented by previous authors to describe the interaction between O2 and a frozen Al(111) surface161. Jiang, Guo, and colleagues introduced the permutation invariant polynomial NN (PIP-NN), which uses permutation invariant polynomials as inputs, preserving both molecular permutation symmetry and surface translational symmetry. They applied this approach to simulate H2 on frozen Cu(111) and Ag(111) surfaces162,163,164.

Methods based on high-dimensional NNs (HDNN) by Behler and Parrinello160 allow the inclusion of surface degrees of freedom and form the most common type of MLIP applied for dynamics at surfaces. Guo and co-workers were the first to use HDNNs to investigate HCl scattering events on a dynamic Au(111) surface165,166. Gerrits employed the HDNN code RuNNer100 to model H2 reactions on a curved Pt crystal. This PES model, trained on data covering multiple Pt surfaces, allowed for larger unit cells without significant accuracy loss, making it suitable for simulating realistic crystal systems167. HDNNs were also used to study solid-liquid systems, in particular, for the dynamics of water at metal-based surfaces. Natarajan and Behler employed HDNN-based PESs, to investigate water-copper interfaces for low-index Cu surfaces168 and stepped surfaces, including surface defects169.

Jiang and co-workers developed the embedded atom NN (EANN), which was shown to provide highly efficient force evaluations when compared to previous HDNN-based approaches and enabled accurate model construction with fewer data points101. This method has been applied, for example, to simulate the scattering dynamics of NO on Ag(111)170 or CO adsorption on Au(111)171,172. Later, a piece-wise EANN (PEANN) was proposed, in which the original Gaussian-type orbitals used for descriptors are replaced with piece-wise switching functions leading to significant improvement in efficiency when applied to dissociative chemisorption of CH4 on flat Ir(111) and stepped Ir(332)173. A notable improvement to EANN was introduced in recursive EANN (REANN), in which a message-passing scheme was implemented for orbital coefficients, leading to increased accuracy and transferability of the models. REANN was applied e.g., to model SiO2 bulk or dissociative chemisorption of CO2 at Ni(100)174,175.

Equivariant interatomic potentials

MLIPs based on deep message-passing (MPNN) have been gaining significant traction. These models not only learn a target property but also the input descriptor in an end-to-end fashion. This promises a reduction of bias compared to hand-crafted descriptors. One of the first deep atom-centered message-passing NN is the SchNet model103,104. A recent milestone is the advent of equivariant MPNNs, where the output transforms consistently with the input transformations, allowing them to capture the complex geometric relationships between atoms. Batzner et al. developed neural equivariant interatomic potentials (NequiP), an equivariant MLIP based on MPNNs, that shows excellent accuracy in predicting energy and forces. It employs symmetry awareness by using E(3)-equivariant convolutions for interactions of geometric tensors. With this, the algorithm significantly reduces the amount of required data (up to 3 orders of magnitude) compared to other contemporary algorithms. The authors employed NequiP to study formate dehydrogenation on a Cu(110) surface102. Stark et al. explored the importance of equivariance by comparing invariant (SchNet103) and equivariant (PaiNN114) MPNNs for H2 dissociative adsorption on Cu surfaces, demonstrating that equivariant features enhance atomic environment descriptiveness, leading to more accurate models and smoother energy landscapes, while requiring fewer training data points176.

Foundation models

Recently, several foundation models (also called universal MLIPs), including MACE-MP-0177, ANI/EFP178, M3GNet-DIRECT179, ALIGNN-FF180, and CHGNet181, have been introduced. These models are trained on a vast number of chemical species and data points with the ambition to provide a transferable prediction across a diverse range of systems. They have been successfully employed in modeling heterogeneous catalysis, or water/SiO2 interface dynamics177. Foundation models are pre-trained on large databases and often deliver qualitatively accurate predictions for tasks such as geometry optimization formation energy prediction182. Foundation models can suffer from softening (systematic underprediction of energies and forces), which can be overcome by retraining (or fine-tuning) on additional first-principles datasets (as few as 100 data points)183,184. Conversely, retraining foundation models may reduce their universality and lead to decreased performance on certain systems, including those present in the refinement dataset184. However, in many cases, compromising some degree of universality over accuracy for particular applications can be desirable. An example of such a case, for gas-surface dynamics (H2 dissociation at Cu surfaces) was recently proposed by Radova et al. who introduced the MACE-freeze method185, in which transfer learning with partially frozen model parameters is applied to the MACE-MP foundation model. The approach provided highly data-efficient MLIPs, in which similar accuracy to from-scratch-trained models is achieved with only 10-20% data points. The authors also showed that training a light, linear model (ACEpotentials186) based on data generated with the MACE-freeze model as ground truth can lead not only to better efficiency (17 times faster force evaluation as compared to the “small” MACE-MP model) but also to improved accuracy relative to from-scratch-trained linear models.

Data generation

Training of MLIPs requires a large number of data points, the generation of which is computationally expensive, particularly due to the high computational cost associated with surface slab calculations. Adaptive sampling, also known as on-the-fly training, or active learning seeks to select the most informative data points to optimize the learning process with minimal data. Two strategies are particularly prevalent in surface science, uncertainty-based sampling and diversity sampling. In uncertainty-driven sampling, the model selects data points where its predictions are least confident, often focusing on points near decision boundaries or inherent uncertainties of a Bayesian model27,187. Uncertainty-based sampling may involve balancing exploration and exploitation, as seen in Bayesian optimization methods. While leveraging uncertainties for exploration, they can also enhance exploitation e.g., by leveraging thermodynamic likelihood188. Exploration can also be enhanced through additional techniques, e.g., by perturbing geometries189. Another uncertainty-based method is query-by-committee190, where multiple models probe data points, and new training points are selected where the models show the most disagreement. Query-by-committee was employed by Artrith and Behler in their development of NN potentials for copper surfaces191. Since then, its popularity has grown significantly, and we direct the interested reader to recent topical reviews192,193. Diversity sampling follows a different strategy by ensuring that selected data points are spread across the input space, preventing redundancy and ensuring a broad representation of the dataset. When a large amount of ab initio MD data is available, trajectory subsampling techniques can also be employed to extract the most informative data points from numerous trajectories194. Interestingly, a combination of diversity sampling (clustering) and uncertainty-based sampling can significantly improve learning rates of MLIPs compared to using exclusively one or the other method195.

Inclusion of long-range effects

Long-range dispersion interactions often constitute one of the dominant interactions at hybrid organic/inorganic interfaces. For instance, the perylenetetracarboxylic dianhydride (PTCDA) molecule on Ag(111) is dominantly bonded by long-range interactions, despite also being anchored by peripheral oxygen atoms15,28. These interactions are non-local by nature, and to describe them within an MLIP may require non-local descriptors with larger cut-offs than otherwise needed. This may render ML-models inefficient. Using SchNet, Westermayr et al. developed a long-range dispersion-inclusive MLIP that facilitates structure search and geometry optimization of organic/inorganic interfaces196. They used two NNs, one to learn the vdW-free interaction energy and one to learn the Hirshfeld volume ratios, allowing the calculation of vdW interactions as a correction to the vdW-free NN. Around the same time, Piquemal197 and Caro198 introduced similar approaches. Other long-range separated MLIPs include the Long-Distance Equivariant (LODE) method199,200, which employs local descriptors to represent Coulombic and other asymptotically decaying potentials around atoms, and the Latent Ewald Summation (LES)201, designed specifically to address long-range interactions in atomistic systems. Long-range electrostatics, in addition to long-range dispersion, have been included in the HDNN models by Behler202 and the SpookyNet model203.

Benchmarking potential accuracy

The accuracy of MLIPs is often assessed using standardized datasets such as MD17204, QM9205, OC20111, and OC22206. The MD17 database contains MD trajectories for ten small organic molecules. The QM9 database contains relaxed geometries of 134,000 small, stable organic molecules composed of C, H, O, N, and F, for which geometric, energetic, electronic, and thermodynamic properties have been computed. Table 2 presents mean absolute errors for predicted energies. Almost all published MLIP models have been assessed on MD17 or QM9 datasets, which both cover only organic molecules and are not directly relevant to surface science problems. More applicable to surface science is the open catalyst (OC) project111,206. The OC20 dataset comprises over 264 million data points, featuring relaxed and unrelaxed structures, adsorption energies, and atomic forces for various catalyst-adsorbate interactions. OC22 further extends OC20 with 9,8 million data points of complex reaction pathways and dynamic simulations, adding temporal data and focusing on reaction kinetics, making it a valuable resource for studying atomic-scale catalytic processes. Table 2 provides a comprehensive summary of literature benchmarks on these datasets (Supplementary Table 1 in the Supplementary information provides additional published benchmarks). In the case of the OC20/22 datasets we focus on two different tasks highly relevant to surface science: (A) The first benchmark targets the prediction of the total energy, as calculated by DFT, for a given structure – structure to energy and forces (S2EF). (B) The second benchmark targets the prediction of the relaxed DFT total energy for a given initial structure – initial structure to relaxed structure (IS2RS).

Table 2 MAE in meV for the prediction of energies (and geometry optimizations in case of OC20 IS2RE); Blue and red colors indicate small and large MAEs respectively; MLIPs trained and tested on MD17 (1000 training data points), QM9 (110,000 training data points), OC20 (460,328 training data points), and OC22 (8,225,293 training data points); MAEs on OC20/22 for out-of-domain set; MAEs are taken from102,112,175,206,253,254,255,256,257,258,259,260,261,262,263,264; A more detailed table with MAEs can be found in the Supporting Information25,62,101,102,103,112,113,113,114,115,175,255,258,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279

Benchmarking prediction speed of potentials

Often the primary measure of the quality of MLIPs is the prediction accuracy, which is usually determined by the mean absolute error (MAE) and the root mean square error (RSME) on a test set. However, time-to-solution can be equally important in the context of the need for accurate statistical sampling of nonequilibrium gas-surface dynamics. Stark et al. benchmarked MLIPs such as ACE, MACE, REANN, and PaiNN207, evaluating accuracy via statistical ensemble averaging and comparing evaluation speeds on CPU and GPU architectures (Fig. 4). MACE and REANN models achieved good accuracy and efficiency within CPU architectures that are most suitable for massively parallel multi-trajectory dynamics. MACE provided a superior performance within GPU architectures. As a linear model, ACE is the fastest on CPU architectures, but providing training data to ensure consistent accuracy is more challenging than for MLIPs.

Fig. 4: Comparison of MLIP accuracy and time-to-solution.
figure 4

Force test RMSEs in meV/Å plotted against force evaluation time in milliseconds from ref. 207 obtained with models based on different architectures, such as PaiNN (squares), REANN (circles), MACE (triangles), and ACE (stars). Data points and error bars correspond to the RMSE average and standard deviation over 5-fold cross-validation splits. The striped circle corresponds to the model obtained with the newer REANN version, which uses a Fortran-based neighbor list calculator. Black markers indicate models evaluated using GPU architecture. The CPU evaluation times were calculated using a single AMD EPYC 7742 (Rome) 2.25 GHz CPU processor core. GPU evaluation times were obtained with NVIDIA V100 GPU.

Open challenges

What is evident from the overview in Table 2, is the existence of a benchmarking gap, as most interatomic potentials are commonly evaluated on molecular datasets rather than surface structures. Notably, various potentials, such as ACE, MACE, or REANN are routinely applied in surface dynamics but were not yet benchmarked on the OC20/22 datasets. Moreover, there exists an accuracy gap: For example, the OC20/22 datasets were determined using GGA(+U) level of theory111,206, while the molecular database QM9 uses hybrid-DFT (B3LYP) level of theory205. Surfaces and interfaces are governed by mechanisms such as hybridization, charge transfer, Pauli repulsion, vdW interactions, level alignment, and surface-mediated electronic states15, often requiring beyond-GGA accuracy, as exemplified by the CO on metals puzzle16,17. To further complicate things, relevant systems frequently comprise hundreds or thousands of atoms, limiting the ability to use a suitable level of theory. These challenges make generating the underlying data difficult and computationally expensive, and can introduce greater intrinsic uncertainty due to varying convergence thresholds.

Excited states and nonadiabatic dynamics

In most cases, the Born-Oppenheimer approximation is valid for describing the dynamics at surfaces; however, especially when considering dynamics at metals or low-bandgap semiconductor surfaces, electronic excitations and ensuing nonadiabatic effects cannot be ignored. Examples of approaches that can be used to simulate nonadiabatic electron-nuclear coupling effects in MD include Ehrenfest dynamics, a range of trajectory surface hopping methods208,209, and MD with electronic friction (MDEF)210. MDEF has been commonly used in the study of reactive scattering and light-driven dynamics of small molecules at metal surfaces. In this method, the dynamics are propagated on a single ground state PES. Atoms experience additional forces due to electronic friction and a random white noise term within a Langevin dynamics framework (see Fig. 5). Two approaches have been proposed to calculate the electronic friction tensor (EFT) from first principles: The local density friction approximation (LDFA)211 and first-order response theory based on DFT (otherwise known as orbital dependent friction or ODF)212,213. In LDFA, the EFT is reduced to scalar friction values based on a bare surface electron density. ODF is calculated from Kohn-Sham DFT through time-dependent perturbation theory and provides a full EFT.

Fig. 5: Exploration of nonadiabatic effects at metal surfaces using MDEF.
figure 5

MLIPs can be combined with electronic friction ML models within the MDEF to accelerate MD simulations beyond the adiabatic approximation.

Within a static surface approximation, LDFA can be efficiently evaluated from first principles, requiring only a single calculation of the surface electron density. Alducin, Juaristi, and co-workers published a series of studies on light-driven surface reactions, incorporating NN-based MLIPs in combination with LDFA friction, e.g., to study laser-driven desorption214,215. However, especially at high temperatures, the inclusion of surface degrees of freedom may be crucial in simulating processes at surfaces, which can benefit from the use of flexible and efficient surrogate models of the EFT in addition to the MLIP that describes the PES. Alducin, Juaristi, and co-workers addressed this in later studies by employing a simplified density model based on exponentially decaying functions, that allows fast and accurate predictions of electronic friction within LDFA as a function of surface atom motion216,217,218.

Utilizing ML techniques to represent the EFT is crucial within first-order perturbation theory (also called ODF) due to the high cost of linear response calculations in DFT. Spiering, Meyer, and co-workers introduced a symmetry adapted six-dimensional NN-based EFT model for H2 and D2 on Cu(111) and for N2 on Ru(0001) surface219,220. Zhang, Maurer, and co-workers created a NN-based EFT model that accounts for the covariance properties of the EFT with respect to the surface symmetry using a simple mapping scheme and used it to study the reactive scattering of H2 on Ag(111) surface221,222. The authors further improved the model by preserving the positive semidefiniteness, directional property, and correct symmetry-equivariance of EFT and tested it on the same example of H2 dynamics at Ag(111) surface223. The model was also employed to study NO dynamics on Au(111)224. Recently, Sachs et al.225 introduced an Atomic-Cluster-Expansion-based EFT model (ACEfriction), utilizing equivariant representations of tensor-valued functions that satisfy all the symmetries of EFT by construction, allowing highly accurate and efficient prediction of friction and diffusion tensors. The construction of ACEfriction provides for size-transferability by enabling the prediction of EFTs of multiple adsorbates and larger friction tensors. The model was tested on NO/Au(111) system. It was also applied by Box et al. in the context of H atom scattering on Pt226.

Nonadiabatic effects are also explored with other techniques. Several ML models were constructed for effective surrogate Hamiltonians coupled most commonly with trajectory surface hopping dynamics methods227,228,229,230,231,232,233,234. For example, Liu et al. combined an ML-based Hamiltonian surrogate and force fields with decoherence-induced surface hopping to study defects in MoS2235. Meng et al. used ML for predicting excited states based on constrained DFT to construct an effective diabatic Hamiltonian, which was propagated with independent electron surface hopping dynamics. This was used to study nonadiabatic dynamics of CO scattering on Au(111)236. ML surrogate models accelerate and crucially enable quantum and mixed-quantum-classical dynamics in gas-surface dynamics, which makes this an exciting application area of ML in the coming years.

Open challenges

Most applications on excited state and nonadiabatic dynamics on surfaces have so far focussed on dynamics in the classical path, mean-field approximation, or molecular dynamics with electronic friction. These methods have uncontrolled and insufficiently assessed errors and are limited to situations of weak nonadiabaticity and strong adsorbate-substrate hybridization237. To enable the application of more robust mixed quantum-classical simulation methods to high-dimensional surface systems, ML electronic structure surrogates (as described in Large-scale electronic property and spectroscopy prediction) will play a crucial role. These models will need to be able to cope with sparse data of suboptimal accuracy. Accurate and scalable first principles reference calculations of excited-state properties of surfaces, for example, based on Many Body Perturbation Theory, are not routinely available for data generation. Equally important will be the development of improved dynamics methods that can efficiently capture nonadiabatic electron-phonon coupling, but also quantum decoherence and electron-electron scattering effects, which cannot be neglected in surface systems with high electronic DOS.

State of the field and future perspective

Data-driven and ML methods offer powerful tools to accelerate surface structure determination and energy and force prediction in surface dynamics simulations. ML approaches are used to accelerate geometry optimizations, enabling precise determinations of molecular adlayers and surface clusters in systems with many degrees of freedom. They also aid in determining energy barriers and transition states, with ML-accelerated NEB methods as a prominent example. ML is increasingly used to enhance or even replace kMC methods. Early approaches to improve the efficiency of surface dynamics relied on corrugation-reduction or the modified Shepard interpolation. Recently, machine-learned PES were used to attain cost-efficient energies and forces for MD simulations. Beyond the Born-Oppenheimer approximation, ML approaches are applied to learn electronic friction or excited-state surrogate Hamiltonians.

However, ML and data-driven methods in surface science often focus on simple or idealized systems, neglecting surface reconstructions, assuming low temperatures and ultra-high vacuum, focusing on single atoms or small molecules, or imposing commensurability. Highly relevant challenges, such as modeling incommensurate structures, kinetics and growth, high-pressure and high-humidity systems, solid-liquid interfaces with charged ions, electrochemical potentials and conditions, and light-matter interactions, push the limits of current approaches. Bridging this complexity gap requires efficient, massively scalable, and easily deployable ML tools trained on large, well-curated, and accurate datasets built on a synthesis between computational and experimental data. Hereafter, we will summarize the most important challenges to the application of ML methods in surface science.

Benchmarking gap

There is a pressing need for more comprehensive benchmarking of MLIPs for surface simulations. As highlighted in Table 2, recent advancements in dynamic model development are promising, but comprehensive benchmarking of widely used MLIPs on specific surface science datasets such as OC20/22, remains insufficient. Moreover, benchmarking on databases containing barriers (OC20NEB110) and thermodynamic properties remains largely unaddressed. Addressing this gap calls for a community-wide effort to test a broader range of models on existing surface science datasets while also creating more and more challenging datasets. These should extend beyond equilibrium geometries and transition state data, capturing the complexity of gas-surface dynamics and other non-equilibrium dynamical phenomena. Establishing a standardized benchmark dataset for gas-surface dynamics could provide a valuable foundation for the assessment of MLIPs.

Accuracy gap

Most current datasets in surface science are derived from DFT, predominantly at GGA level of theory. While GGA is widely used, it is often inadequate to accurately capture key surface properties, particularly energy barriers and adsorption energies. Datasets based on hybrid DFT or more advanced exchange-correlation energy descriptions and beyond-DFT methods remain scarce, and their reliability for surface science problems is not fully understood238. Even more critically, there is a significant lack of high-quality experimental data on structure, stability, and kinetics to guide or validate theoretical models50. Relying solely on learning from GGA-DFT data will not fundamentally address the challenges in the electronic structure description of surfaces and interfaces.

Computational surface science data often contains more intrinsic noise and uncertainty compared to molecular data, which needs to be considered when applying ML methods. This is because surfaces are inherently more complex and require larger system sizes than molecules or bulk materials15. The difficulty in imposing tight convergence criteria results in the absence of a definitive “gold standard" for accurate first-principles surface computations. Depending on the system, different computational approaches may be suitable15. Exchange-correlation functionals differ widely in their predictive capabilities, and even for the same density functional approximation, different codes provide different results due to numerical approximations239.

Data gap

Many computational approaches in surface science are data-hungry, often requiring thousands of data points for accurate predictions. Computational data generation is costly due to the high demands of quantum mechanical calculations for large systems, particularly those with long-range interactions such as incommensurate structures or surface reconstructions. Experimental data generation often requires highly controlled environments and time-consuming sample preparation. Efforts to address this include efficient training algorithms such as optimal selection and active learning, which rely on estimators for information gain. A common estimator is the model uncertainty, which may, however, often fail to align with true errors, leading to overconfidence in poorly trained regions and a failure to generalize effectively. In highly regularized equivariant MPNNs, uncertainty estimation methods like bootstrapping or committees are less effective115. Addressing these shortcomings requires improvements in uncertainty quantification techniques, better validation protocols, transfer learning, and the incorporation of multi-fidelity approaches that combine high-accuracy data with less expensive, approximate calculations.

Transfer learning has gained increasing attention, particularly with the rise of foundation models (see Accurate dynamics at large time- and length scales). While research has primarily focused on MLIPs, there remains a significant gap in developing reliable methods to assess the transferability of ML models across systems. Ensuring transferability between small- and large-scale systems, or across different surfaces and compositions, is crucial. Additionally, generating validation data for large systems will be necessary to confirm this transferability. Little research has been conducted so far on transfer learning for the direct prediction of kinetic properties, barriers, or other properties beyond energies and forces.

The scarcity of high-fidelity computational and experimental data underscores the urgent need for data synthesis methods and multi-fidelity learning, such as multi-head NNs and transfer learning185,240,241. These methods can synthesize information across different levels of theory–combining data from lower-accuracy methods (e.g., GGA-DFT) with more accurate beyond-DFT calculations, such as hybrid DFT, GW, or random phase approximation, and experimental observations. Data augmentation methods, where experimental data is supplemented with synthetically generated datasets, have shown promise for improving the analysis of scattering experiments242. Integrated approaches will help to bridge the gaps between theoretical approximations and experiments, for instance enhancing the modeling of transition barriers, reaction rates, phase diagrams, and spectroscopy. They may also be able to optimize experiments and simulations by guiding resource allocation and improving data efficiency.

Open and FAIR data

Computational surface science generates vast amounts of data, but only a fraction is typically relevant to specific tasks. This data must be publicly available and well-curated, in compliance with the FAIR data principles243. Table 3 lists common databases that feature surface science data. Most databases are not exclusive to surface science and contain data from the wider field of computational chemistry/physics. Establishing large, consistent datasets is crucial for advancing ML in computational surface science. To foster reproducibility, knowledge transfer, and data reuse, we urge the community to make well-curated computational data publicly accessible.

Table 3 List of common databases for computational materials data as well as automated data generation tools

Efficient and accurate inference

A notable challenge is the improvement of inference performance. Many current ML models are, while powerful and accurate, too computationally expensive for high-throughput predictions of properties or long-time-scale and large-system-size dynamics simulations in surface science. This is particularly important for nonequilibrium dynamics at surfaces that require ensemble averaging over tens of thousands of trajectories (e.g., to evaluate quantum-state-resolved reactive scattering probabilities). New developments in the accurate and efficient description of atomic environments and in MLIP architectures continue to improve accuracy and transferability, while reducing the computational cost of model evaluation, as shown in Fig. 4. GPU architectures and workflows can further reduce computational costs, however, the current bottleneck in deploying or renewing GPU compute architectures should also encourage us to continue to seek faster prediction on CPU architectures and models that benefit from just-in-time compilation.

Scalability and deployability are equally important. Efficiency goes beyond simply reducing inference time, it involves creating scalable and automated workflows that are adaptable to diverse and heterogeneous computing architectures. For complex simulations of nonequilibrium dynamics at surfaces—such as gas-surface interactions, atomic layer deposition, chemical vapor deposition, or rare event sampling—an effective ML approach must support high-throughput ensemble sampling. Achieving this requires solutions that balance computational loads across GPUs and CPUs efficiently. ML frameworks must support batched evaluations to maximize GPU utilization while simultaneously enabling parallel task farming for ensemble dynamics propagation. This dual focus ensures scalability in handling both individual trajectory evaluations and the broader orchestration of ensemble simulations. Furthermore, workflow integration is key. Effective ML deployments for surface science must seamlessly integrate data preprocessing, on-the-fly model evaluation, and post-processing while maintaining robustness and flexibility207,244. Moreover, it is essential to integrate ML model predictions with downstream tasks, such as predicting electronic properties or Hamiltonians, as well as incorporating them into efficient bandstructure calculations245.

Machine learning and the future of surface characterization

Multi-technique characterization is a central paradigm in surface science, encompassing various spectroscopic methods, diffraction techniques, and imaging approaches such as scanning probe microscopy. Despite their importance, there has been limited progress in leveraging ML to bridge the gap between simulation and experiment in these techniques. This presents significant untapped potential for innovation and advancement as ML can use atomistic simulation data to complement highly integrated experimental measurement signals. Recent developments illustrate this promise, including the prediction of spectroscopic properties such as Raman and surface-enhanced Raman scattering measurements246, surface characterization using deep learning and infrared spectroscopy247, and classification in X-ray absorption and emission spectroscopy248. Efforts such as learning metastable phase diagrams249, reaction networks140, and catalytic activity from experimental data121 pave the way to learning end-to-end transition rates from experimental data. These efforts demand extensive experimental databases supported by automated labs, similar to the recently proposed OCx24 database250. Finally, efforts such as the application of reinforcement learning to automate scanning probe experiments251 will benefit from models that are aware of hidden atomic-scale conformations252 for example, by being trained on complementary first-principles data. These examples underscore the vast opportunities for ML to revolutionize surface science.

Surface science, perhaps more than other fields, offers a unique opportunity to integrate highly controlled experiments aimed at uncovering fundamental physics with the power of ML and data-driven approaches. Its distinct challenges, combined with its significant industrial relevance, continue to drive innovation in computational methods, fostering advancements with transformative impacts that extend well beyond the field itself.