Machine learning and data-driven methods in computational surface and interface science

Hörmann, Lukas; Stark, Wojciech G.; Maurer, Reinhard J.

doi:10.1038/s41524-025-01691-6

Download PDF

Review
Open access
Published: 01 July 2025

Machine learning and data-driven methods in computational surface and interface science

npj Computational Materials volume 11, Article number: 196 (2025) Cite this article

4342 Accesses
3 Citations
9 Altmetric
Metrics details

Subjects

Abstract

Machine learning and data-driven methods have started to transform the study of surfaces and interfaces. Here, we review how data-driven methods and machine learning approaches complement simulation workflows and contribute towards tackling grand challenges in computational surface science from 2D materials to interface engineering and electrocatalysis. Challenges remain, including the scarcity of large datasets and the need for more electronic structure methods for interfaces.

Machine-learning-accelerated simulations to enable automatic surface reconstruction

Article 07 December 2023

Small data machine learning in materials science

Article Open access 25 March 2023

Machine-learned potentials for next-generation matter simulations

Article 27 May 2021

Introduction

Many modern technological challenges crucially depend on the properties of surfaces and interfaces. This includes the control of charge and energy transfer across electrode/electrolyte interfaces in batteries¹ and fuel cells², the characterization of structure and dynamics of wear and lubrication at tribological interfaces³, the optimization of chemical transformations at metal surfaces in heterogeneous catalysis⁴, corrosion science⁵ and surface functionalization⁶. Surface/thin film growth processes such as atomic layer deposition, chemical vapor deposition, or molecular beam epitaxy are highly industrially relevant. Increasing demand for functional thin-films and surface nanostructures also increases their structural complexity. As most modern materials involve multiple components, understanding the structure and properties of thin films, composites, buried interfaces, and exposed surfaces is now more important than ever. The atomic-scale characterization and design of functional interfaces require understanding and manipulation at the nanoscale, which often cannot be delivered by experimentation alone.

Computational simulation of surface and interface processes has become central to modern surface science. Few fields rely as strongly on the synergy between atomistic simulation and experimental study. This synergy is achieved by minimizing the gap between experimental complexity and simulation models, often through the use of model surfaces and ultra-high vacuum conditions, which, for example, enable atomic resolution to be achieved in scanning probe microscopy. However, surfaces in real-world applications are often more complex, featuring defects and partial disorder. Additionally, ambient pressure and interacting molecules play crucial roles in many applications, such as catalysis. By advancing the study of complex surface systems and dynamic processes at large length- and time scales with high throughput, machine learning (ML) and data-driven approaches have the potential to bring atomistic simulation and experiment even closer, offering improved mechanistic understanding of surface dynamics, reaction pathways, growth processes, and mechanical and electronic properties.

Common ML methods used in surface science encompass neural networks (NN), Bayesian regression methods, decision trees, support vector machines, and genetic algorithms. Citations for specific uses are given in later sections. These methods can learn expressions for formation energies, potential energy surfaces (PES), and other properties, provide frameworks to efficiently explore the configuration space of the material, or facilitate the optimization of a target property. MLIPs, in particular, are highly impactful and have revolutionized the simulation of bulk materials and are in the process of taking over chemical and biomolecular simulations^7,8. MLIPs are omnipresent in surface science and will be discussed throughout this review. Considering the broad range of ML applications in surface science, we do not dedicate a separate chapter to MLIPs. Instead, we discuss them as part of the section on Accurate dynamics at large time- and length scales, as MLIPs are most commonly used to accelerate dynamics simulations.

Motivation of the review

A number of comprehensive reviews have been published on ML from different perspectives such as heterogeneous catalysis^9,10,11,12 and experimental surface science^13,14. This review targets computational surface scientists seeking to integrate ML techniques, providing an overview of current capabilities and limitations. Surfaces and interfaces present a unique challenge due to complex processes such as charge transfer and bond formation as well as competing interactions such as covalent, electrostatic, dispersion forces, and the coexistence of localized and delocalized electronic states in large unit cells with hundreds or thousands of atoms¹⁵. Examples such as the CO on metals puzzle^16,17 demonstrate that semi-local Density Functional Theory (DFT) often falls short in accurately describing surfaces and interfaces.

Current challenges in surface science (sketched in Fig. 1), such as the modeling of two-dimensional materials, electronic interface engineering, light-matter interaction at interfaces, description of realistic catalysts, incommensurate surface nanostructures, surface electrochemistry, provide specific demands on computational modeling capabilities, which machine learning and data-driven methods can help to address.

**Fig. 1: Schematic depiction of exemplary grand challenge areas in surface science.**

In this review, we focus on these broader computational simulation challenges spanning gas-surface, solid-liquid, and solid-solid interfaces. While each system presents unique challenges, many computational workflows are applicable across all types of surfaces and interfaces. We explore how new ML-enabled workflows are transforming these surface science applications, focusing on their role in (A) structure prediction of realistic surfaces, (B) thermodynamic stability and compositional phase space under realistic conditions, (C) large-scale electronic property prediction, (D) barrier prediction, (E) multiscale surface kinetics, (F) accurate dynamics at large time- and length scales, and (G) excited states, surface spectroscopy and nonadiabatic dynamics. We address the unique challenges of studying lower-dimensional systems compared to bulk or molecular systems and identify gaps in ML tools that, if filled, could greatly enhance computational surface science.

Structure prediction of surfaces and interfaces

Understanding the structure of an interface, surface, or adsorbed layer is crucial for studying properties such as charge transfer, surface states, level alignment, or the interface dipole. Surface structure prediction is challenging due to the vast number of possible structures and the computational expense of large unit cells containing hundreds of atoms. Advanced optimization methods and ML approaches have enabled effective local and global structure optimization and efficient sampling of the PES (see Fig. 2).

**Fig. 2: Schematic representation of structure search tasks.**

Local structure optimization

Today, local structure optimization in surface science is considered a mostly solved issue. Common local optimization routines such as the gradient descent, Broyden-Fletcher-Goldfarb-Shanno, or Nelder-Mead methods can often be used in conjunction with first-principles computations. For particularly demanding problems, surrogate models can be employed: For example, using a Bayesian ML approach, Garijo del Río et al.^18,19 accelerated local geometry optimizations of CO on Ag(111) and C on Cu(100). Their algorithm, called GPMin, conducts optimizations on a machine-learned PES and improves the underlying ML-model by adding each newly found local minimum to the training data. GPMin is available in the atomic simulation environment (ASE)²⁰.

Global structure optimization

Unlike local optimization, global optimization is significantly more challenging due to the vastly larger search space. The uniquely complex interactions at surfaces and interfaces (charge transfer, hybridization, level alignment, etc.) render purely first-principles approaches intractable^15,21. ML based approaches, such as Bayesian regression, tree-based algorithms, genetic algorithms (GAs), and NNs have enabled significant advances in tackling complex systems with many degrees of freedom.

For instance, the group of Bjørk Hammer developed GOFEE (global optimization with first-principles energy expressions)²², which employs a Gaussian process regression (GPR) based surrogate model for energies. GOFEE makes efficient use of limited first principles data through adaptive sampling (see Accurate dynamics at large time- and length scales) and was applied to study the oxidation and oxygen intercalation of graphene on Ir(111). Similarly, Kaappa et al. used Gaussian processes to create a surrogate model for the PES to enable global optimization. Their algorithms, BEACON (Bayesian Exploration of Atomic Configurations for Optimization)²³ and ICE-BEACON²⁴, are available in the GPAtom package. Another approach successfully used for global optimization is Gaussian approximation potential (GAP)²⁵. For example, Timmermann et al. applied GAP with simulated annealing to optimize low-index surfaces of rutile (IrO₂)²⁶.

GPR is also the foundation of the BOSS code²⁷. BOSS combines GPR with descriptors based on Cartesian and internal coordinates, using model uncertainty for active learning (see Accurate dynamics at large time- and length scales). BOSS requires only a few hundred energy evaluations to construct a five-dimensional PES, though the number of necessary training data points increases exponentially with the degrees of freedom. A similar approach was developed by Hörmann et al., using radial distance functions as descriptors to predict the PES of single molecules on surfaces²⁸. We note that the SAMPLE^21,29 and GAMMA³⁰ approaches, discussed in the next chapter, facilitate structure prediction of molecular adlayers on surfaces.

A different approach was used by Li et al.³¹ to create a transferable ML model for the prediction of adsorption energies of single-atom alloys. They utilized XGBoost³², which uses a boosted tree algorithm based on gradient boosting. Hereby a series of trees is used: The first tree learns the original data, while every subsequent tree learns the residual of the previous tree, thereby improving the accuracy. XGBoost is widely used in computational materials research and has, for instance, been employed to model molecular adsorption in metal-organic framework³³ or for the discovery of heterogeneous catalysts³⁴.

GAs have been widely applied for global optimization of surface structures. While many of these algorithms were originally developed to find minimum energy structures for bulk materials or proteins, they have since been adapted for surfaces. A prime example is USPEX (Universal Structure Predictor Evolutionary Xtallography)³⁵ which uses a GA to computationally predict crystal structures. Wen et al. used the USPEX code to determine the structure of a mixed-metal oxide monolayer grown at an oxide support³⁶. A key challenge in performing GA optimizations with first-principles calculations is the high computational cost of surface calculations, prompting efforts to minimize the required computations. Chuang et al. developed a GA that avoids the evaluation of duplicate structures by ensuring that the structures in the pool either differ in energy or atomic displacements, allowing them to determine the reconstructions of semiconductor surfaces³⁷. Computational effort can also be reduced by coupling first-principles calculations with less resource-intensive methods. For example, Bjørk Hammer’s group implemented a two-stage DFT optimization approach, which pre-screens candidate structures with less accurate, lower-cost methods to identify and remove duplicates. Promising new structures are then refined using more accurate DFT calculations^38,39. Another efficient strategy is to pair a GA with a surrogate model for the target property. Jacobsen et al. used a kernel ridge regression ML model with adaptive sampling to guide a GA for optimizing surface reconstructions of oxide materials⁴⁰, leveraging feature vectors from Oganov et al.⁴¹ with adaptive sampling (see Accurate dynamics at large time- and length scales) to generate training data.

Particle swarm optimization has also been applied to predict surface structures. A notable example is the CALYPSO (Crystal structure AnaLYsis by Particle Swarm Optimization) code⁴². Using CALYPSO, Lu et al. developed a method to explore the surface structures such as diamond surface reconstructions featuring self-assembled carbon nanotube arrays⁴³. The code has also been employed to predict solid-solid interface structures. Optimal lattice-matched superlattices are determined, employing constraints on interatomic distances and atomic coordination numbers to generate starting interface structures⁴⁴. Moreover, CALYPSO was used to reveal the CeO₂ surface reconstruction. First-generation structures were created with CALYPSO, based on input parameters such as the bulk crystal surface structure, number of atoms that form the interface, and thickness of reconstructed layers⁴⁵. DFT was used to optimize these structures compared to experiment.

While most ML applications in surface science use supervised learning methods, reinforcement learning methods are also gaining traction in the structure prediction community. Meldgaard et al. predicted crystal surface reconstructions by using reinforcement learning combined with image recognition⁴⁶. The learning agent employs a deep NN to decide the placement and type of the next atom in an incomplete structure. The new structure—which represents a new state—is evaluated using DFT and a reward is determined. States and rewards are saved and used by the agent for future atom placements. This algorithm is an extension (to 3D structures) of the ASLA method⁴⁷ which generated 2D materials and molecules with reinforcement learning.

Open challenges

Despite major advances in accurately predicting global minimum surface structures, current methods are limited to systems with a few hundred atoms^27,48. Large-scale systems, such as incommensurate surface structures or realistic catalytic surfaces, containing tens of thousands of atoms remain beyond reach due to the limited efficiency and precision of current inference methods.

Surfaces regularly exhibit multiple (even thousands) polymorphs with differences in stability below 1 kcal/mol per atom (40 meV)^21,49, despite exhibiting different properties, such as the work function. DFT is the workhorse method in the field, but different functionals, especially the typically used Generalized Gradient Approximation (GGA), struggle to yield energy computations that are consistently within an accuracy of 1 kcal/mol per atom^28,50,51. While more accurate first-principles methods exist, their cost conflicts with the demand for large datasets to improve ML model accuracy. More data-efficient approaches are required. Several examples of transfer learning exist for molecules^52,53 and bulk materials^54,55. Recently, foundation models (also called universal MLIPs) have been introduced, that are trained on large and diverse databases and often deliver adequate accuracy for tasks such as predicting formation energies and performing preliminary geometry optimizations.

Nevertheless, gaps in accuracy and amount of available training data, as well as limited inference efficiency, remain open challenges in the prediction of realistic surface structures.

Thermodynamic stability and compositional phase space under realistic conditions

Understanding thermodynamic stability and compositional phase space enables the prediction of surface structures under realistic conditions, essential for targeted materials design and experimental interpretation. Realistic conditions, representative of common experimental setups and practical applications, typically encompass temperatures around room temperature, pressures spanning from ultra-high vacuum to atmospheric levels, and chemical potentials reflecting the specific surrounding gaseous or liquid environment.

Hörmann et al. developed a surface structure search algorithm called SAMPLE^21,29, which enables the prediction of compositional and thermodynamic phase diagrams. SAMPLE can learn energies and work functions⁵⁶ and generate a comprehensive list of surface structures based on coverage and the number of molecules per unit cell. It used a pair-wise potential fitted using Bayesian linear regression from a training set of a few hundred DFT evaluations. The training set is chosen using optimal design theory to maximize the information gained, addressing a key challenge in ML for surface science–efficient data use. Combined with ab-initio thermodynamics⁵⁷, SAMPLE can provide access to thermodynamic quantities via the partition function. The authors later extended this method to predicting phase diagrams of near-incommensurate surface structures^49,58. A similar approach to SAMPLE is the GAMMA (Generalized block AsseMbly Machine learning equivalence clAss sampling modeling) approach³⁰. It employs an Ising-type model based on energy calculations of molecule-molecule and molecule-substrate interactions performed in isolation. This approach contrasts with SAMPLE, which derives these interactions from formation energy calculations of the adsorbed system. GAMMA explores the configuration space stochastically using an equivalence class sampling algorithm that removes redundant information, allowing it to consider large unit cells and explore how interaction strength and temperature affect the self-assembly processes of molecules on substrates.

A different approach was taken by Ulissi et al., who used GPR to learn the coverage-dependent free energy for IrO₂ and MoS₂ surface reconstructions⁵⁹. Training on a small number of DFT calculations, selected using the inherent uncertainty of GPR, allows the authors to construct Pourbaix diagrams that illustrate the stability of surfaces in electrochemistry applications as a function of pH and electrochemical potential.

ML has also facilitated research into the phase behavior of solid-solid interfaces. Moayedpour et al., for instance, studied epitaxial interfaces of tetracyanoquinodimethane on tetrathiafulvalene⁶⁰. The authors used an improved version of the Ogre⁶¹ code to generate possible surface structures and predicted energies using the ANI⁶² MLIP in conjunction with the D3⁶³ vdW-correction. The Ogre code facilitates generating ideal (perfectly ordered) interfaces, without considering growth conditions. Huber et al. have used gradient-boosted decision trees, based on descriptors that depend only on local properties of the grain boundary, to predict the segregation energy distribution, and thus the segregation isotherm of grain boundaries of metallic solutes in aluminum⁶⁴. Their model provides improved predictive power over the common Langmuir-McLean model.

A data-driven, but not directly ML related approach for determining surface thermodynamic properties is nested sampling⁶⁵. Yang et al. used this method to calculate adsorbate phase diagrams, incorporating all relevant configurational contributions to the free energy⁶⁶. Nested sampling requires significant numbers of energy evaluations, the cost of which was overcome by the authors by using a Lennard-Jones potential to describe interactions.

Ulissi et al. used GPR to learn the coverage-dependent free energy for IrO₂ and MoS₂ surface reconstructions⁵⁹. Training on a small number of DFT calculations, selected using the inherent uncertainty of GPR, allows the authors to construct Pourbaix diagrams that illustrate the stability of surfaces in electrochemistry applications as a function of pH and electrochemical potential.

Open challenges

Exploring thermodynamic stability and compositional phase space presents similar challenges to predicting surface structures (see Structure prediction of surfaces and interfaces). Studying thermodynamic stability presents the additional challenge of learning properties such as the vibrational enthalpies and free energies, which require accurate modeling of anharmonic effects, electronic effects, and environmental conditions. Most ML applications in surface thermodynamics focus on predicting energies of phase candidates. While progress has been made in learning thermodynamic properties for molecular or bulk systems^67,68, as well as ML-enhanced CALPHAD (Calculation of Phase Diagrams) modeling for bulk materials^69,70, little work exists on directly learning surface vibrational enthalpies or other thermodynamic properties.

Comprehensive, accurate databases for vibrational eigenvalues and eigenvectors, phase transitions, or other thermodynamic properties of surface structures are lacking. While thermodynamic properties such as the adsorption energies or vibrational properties can be found in common databases (see Table 3), the amount of data that exists on surfaces is limited. For instance, the recently published Molecular Vibration Explorer⁷¹ in the materialscloud⁷² is only dedicated to a specific class of molecules. As a result of this data gap, little work exists on benchmarking ML approaches for thermodynamic properties.

Large-scale electronic property and spectroscopy prediction

Geometric features of surfaces and interfaces, such as defects, reconstructions, incommensurate structures, step edges, and grain boundaries, affect the electronic structure, thereby contributing to measurable electronic transport, reactivity, and spectroscopic signatures. Accurately capturing these effects in simulations requires the prediction of electronic properties in large unit cells containing thousands to tens of thousands of atoms. ML surrogate models of electronic structure and spectroscopic properties are much less developed than ML models for energy and force prediction. To date, only few proof-of-principle models have been applied in the context of surface science. Their obvious applications lie in high-throughput property prediction and in overcoming the intrinsic scaling limitations of DFT.

Learning level alignment, interfacial orbital hybridization, band bending

Scalar electronic properties can often be learned with methods originally designed to learn energies. For instance, the SAMPLE code²¹ is able to predict the work function of large molecular superstructures on surfaces. Choudhary et al. followed a similar philosophy when developing InterMat⁷³. This approach uses DFT and NNs to predict band offsets of semiconductor interfaces. A graph NN model is used to predict the valence band maxima and conduction band minima of the respective surfaces, using the atomic structures as input data. Training data was generated with DFT. By aligning the vacuum levels of these surfaces, the band offset is determined (according to Anderson’s rule).

Gerber et al. used a purely data-driven approach for InterMatch⁷⁴, a high-throughput computational framework designed to efficiently predict charge transfer, strain, and superlattice structures at material interfaces. The algorithm leverages databases of individual bulk materials to streamline the prediction process. By accessing lattice vectors, density of states, and stiffness tensors from the Materials Project, InterMatch estimates interfacial properties. The code was used to study transition metal dichalcogenides and qualitatively reproduce DFT simulations and experiment. This approach enables high-throughput pre-screening.

Hamiltonian and band structure prediction

Nonlinear learning techniques to parameterize Hamiltonians have a long legacy in semi-empirical and tight-binding models, and several end-to-end learning strategies of Slater-Koster integral tables from calculated properties have been proposed for molecules and materials^75,76,77. Beyond the limitations of the 2-center integral approximations, various linear, kernel-based⁷⁸, and deep-learning-based representations of electronic structure have been proposed for molecules^79,80. These models typically target the Kohn-Sham Hamiltonian in local orbital representation, which provides an atom-centered representation that can encode the rotational transformation properties of local orbital integrals⁸¹ and does not suffer from the non-smoothness and phase issues associated with learning eigenvalues and eigenvectors directly⁸². Additionally, this representation can also be extended to condensed phase systems as the Bloch transform of the real-space Hamiltonian readily enables the calculation of the band structure and electronic density of state (DOS). Examples of recent models include the linear ACEhamiltonians model based on the Atomic Cluster Expansion⁸³, DeepH⁸⁴, and DeepH-E3 models⁸⁵. The latter has been applied to two-dimensional materials such as twisted graphene bilayers.

Prediction of electronic density of states, the electron density, and spectroscopic properties

The ability to predict the electronic DOS or the electron density directly (rather than through diagonalization of a Hamiltonian) is valuable in studying charge transfer and the relationship between surface structure and electronic properties. SALTED is an approach to learning the electron density expressed as a linear combination of atom-centered basis functions. The corresponding basis coefficients are trained using Gaussian process regression⁸⁶. This approach has been applied to condensed phase systems, two-dimensional materials, and electrochemical interfaces⁸⁷. In the latter case, it is able to accurately predict the charge distribution of a metal electrode in contact with an explicit aqueous electrolyte. Several approaches have been proposed that represent the DOS in condensed matter systems either by directly learning a spectrum or by representation in a suitable atom-centered basis^88,89,90. This approach has been successfully employed for Al metal slabs and is readily applicable in surface science applications.

Related to learning the DOS and electron density are ML models for spectroscopic properties. Methods such as ultraviolet photoelectron spectroscopy (UPS), x-ray photoelectron spectroscopy (XPS), and near-edge x-ray absorption fine structure (NEXAFS) provide invaluable information on the chemical state and bonding environment of atoms in samples. Often, spectroscopic assignment is challenging, and first-principles simulations can provide valuable input. ML methods provide the means to accelerate predictions, with kernel ridge regression and neural network models having been proposed to predict (inverse) photoemission signatures⁸², XPS^91,92, and NEXAFS^93,94, but have mostly focused on bulk systems. Additionally, ML approaches enable novel discoveries, such as inverse structure determination based on target spectra⁹⁵. ML methods have seen widespread application in spectroscopy, which has been summarized in comprehensive reviews on the subject^82,96,97,98.

Open challenges

A critical open challenge in electronic structure and spectroscopy surrogate models and their applications to surface science is the integration with existing tools and workflows. Most current models are focused on bulk systems, but would easily be transferable to surface science. However, there exists a lack of sufficient training data. So far, little work has gone into making proof-of-principle models user-friendly, scalable, and well-integrated with electronic structure and simulation software. General future directions have been previously articulated⁹⁹ and integration with dynamics will be covered in Excited states and nonadiabatic dynamics. Specifically in the context of surface science, these models will face similar challenges as first principles and semi-empirical methods as interfaces offer a rich diversity in local atom and bond environments that is hard to capture¹⁵. In addition, long-range electrostatic and dispersion interaction contributions to the electronic properties must be considered. We believe that the developments in this field will drastically pick up in the coming years with many ML developments, such as equivariant NNs, straightforwardly transferring into electronic structure surrogates.

Reaction barrier prediction

Most experiments and industrial applications (e.g., surface deposition processes, catalysis) operate away from thermodynamic equilibrium, making it essential to consider kinetic processes at surfaces. This involves identifying metastable states, barriers, transition states, and transition rates at surfaces. The extensive energy and force evaluations needed for such studies demand highly efficient ML approaches.

Transition state search

Transition state search is essential to the study of surface dynamics. The transition energy, for instance, defines reaction rates. Although many successful algorithms were developed for transition path and transition state search, like e.g., nudged elastic bands (NEB), there is still an urgent need for automation and speed-up. Multiple novel methods^{25,100,101,102,103,104,105,106,107} allow constructing MLIPs based on first principles data. Such MLIPs can replace more computationally expensive DFT calculations within the NEB algorithm. In most cases, training MLIPs during the evaluation of NEBs is as computationally costly as directly evaluating the NEBs using DFT codes. However, the resulting models can be later used for other purposes, e.g., for kinetics. One of the earliest examples of ML accelerated transition state search was introduced by Peterson¹⁰⁸, who created an approximate NN-based PES using the atomistic ML-based package (Amp)¹⁰⁶ to speed up NEB calculations. After finding an initial saddle point, results are confirmed with first-principles calculations, and the model is retrained with the new data points. This is repeated until agreement with the first-principles calculations is reached. The method was employed for two systems: diffusion of a gold atom on an Al(100) surface infused with Pt atoms and for bond rotation in ethane, in both cases requiring significantly fewer force calls than standard DFT-based NEB. Schaaf et al. presented a general protocol for the prediction of energy barriers of catalytic systems by training MLIPs using active learning based on the energy uncertainty of individual atoms. The protocol was applied to the conversion of CO₂ to methanol at an In₂O₃ surface with a single oxygen vacancy¹⁰⁹.

A recent effort toward large pre-trained MLIPs capable of predicting reaction barriers across diverse catalytic systems was enabled by the OC20NEB database¹¹⁰, containing 932 NEB calculations at the GGA-DFT level. Wander et al. used this database within the CatTSunami framework to validate NN-based MLIP models trained on the OC20 database¹¹¹ (see Accurate dynamics at large time- and length scales), based on 153M and 31M parameter Equiformer v2¹¹², GemNet-OC¹¹³, PaiNN¹¹⁴, and DimeNet++¹¹⁵ models¹¹⁰. The results of this study are reproduced in Table 1, which shows convergence and success rate of predicting barriers of molecules detaching from the surface (desorption), dissociating at a surface (dissociation), or atom exchange between two reactants (transfer). Equiformer v2 shows the best performance, even using the lighter (31M) model. PaiNN and DimeNet++ underperformed in particular for the dissociation and transfer reactions, wherein more degrees of freedom participate in the reaction than for desorption reactions.

Table 1 Performance of Equiformer v2 (Eq2), GemNet-OC, PaiNN, and DimeNet++, trained on OC20 database, in predicting reaction (desorption, dissociation or transfer) barriers, tested on validation set based on OC20NEB

Full size table

To reduce the number of energy and force evaluations required during transition state search, Koistinen, Jónsson, and co-workers introduced GPR-aided NEB, which evaluates only the geometry from the highest uncertainty image of the predicted minimum energy path. The authors tested the algorithm on two cases: a 2D problem and a heptamer island on a (111) surface using Morse potentials^116,117. An improved GPR-based NEB (ML-NEB) was introduced by Garrido Torres and co-workers¹¹⁸. Their algorithm adjusts the entire minimum energy path after every force evaluation, minimizing the number of evaluations needed to converge. The method was tested on Au diffusion on Al(111), Pt adatom diffusion on a stepped Pt surface, and a Pt heptamer island on Pt(111), demonstrating a remarkable reduction in force evaluations compared to established optimization algorithms. For the two-dimensional Müller-Brown potential, the search for the minimum energy path (F_max = 0.05 eV/Å) using the climbing image NEB requires 286 evaluations (11 images), whereas with ML-NEB, it only requires 16 evaluations (Fig. 3). Despite the success of the ML-NEB method in reducing the number of force evaluations needed to find the reaction barriers, it scales cubically with the number of atoms, thus it struggles for higher dimensional problems in which the model retraining step becomes a bottleneck that significantly increases the final evaluation time of NEB calculation.

**Fig. 3: Conventional and ML accelerated minimum energy path search.**

Direct activation energy prediction

Apart from deriving activation energies and rates from transition states, both could in principle also be directly predicted without a prior search for the transition state structure. A ML based direct prediction of activation energies in complex catalytic systems was proposed by Singh et al.¹¹⁹. They employed a forward-search algorithm to select both linear and non-linear features and compared various ML techniques (linear regression, Gaussian process, random forest) for their effectiveness in predicting reaction rates. Focusing on the dehydrogenation and dissociation of N₂ and O₂ on surfaces, a polynomial-feature-based linear regression model performed the best, leading to the accuracy improvements of roughly 2 times over previous, single-parameter linear-regression-based models. Later, Komp et al. employed a more complex ML model based on deep NN to predict the full quantum reaction rate constant for one-dimensional reactive pathways using roughly 1.5 million training data points. The authors tested the model on the diffusion of H on Ni(100) and other non-surface reactions¹²⁰.

Complementary experimental data can significantly enhance the predictions of kinetic properties. For example, Smith et al. proposed using experimental descriptor data combined with dimensionality reduction, principal component analysis, and NN to predict reaction rates of the water-gas shift reaction using different catalysts and reaction conditions¹²¹.

Open challenges

The prediction of reaction barriers has reached a notable advancement with ML techniques. However, an accurate and efficient direct reaction barrier prediction for molecules, materials, and surface chemistry has so far not been achieved. Promising methods, e.g., ML-NEB¹¹⁸, have shown success in predicting reaction barriers of lower dimensional systems; however, more developments have to be made to provide such level of improvements robustly and consistently for higher-dimensional systems. NEB calculations typically demand substantial human intervention. Further development is necessary to create automated workflows that can efficiently identify barriers and transition states. Moreover, few comprehensive data sets, the OC20NEB database¹¹⁰ being an exception, exist on barriers and transition paths which would facilitate development and testing of current and new methods.

Multiscale kinetic simulations

Kinetic simulations are essential for understanding surface reaction mechanisms and growth. Fundamental challenges in kinetic simulations include capturing processes that occur over vastly different time and length scales, accounting for a large chemical reaction space, and quantifying uncertainties and understanding how they propagate across scales. ML can enhance large-scale kinetic simulations by integrating with mean-field microkinetic models, kinetic Monte Carlo (kMC) methods, and reaction networks. While microkinetic models are more computationally efficient, kMC simulations fully capture the reactive site dependence and fluctuations, and reaction networks offer a framework for organizing complex reaction pathways and tracking the evolution of species over time¹²².

Mean-field microkinetic modeling

By assuming a uniform distribution of reactants and intermediates on the surface, mean-field microkinetic modeling simplifies the treatment of coverage-dependent effects and complex reaction networks. A mean-field model is the basis of the RMG-CAT software developed by Goldsmith and West¹²³. RMG-CAT employs graph theory supported by least-squares regression to simulate microkinetic mechanisms for heterogeneous catalysis and has been successfully applied to model the dry reforming of methane on a Ni surface. Tian and Rangarajan introduced a mean-field microkinetic approach that utilizes NNs (NN-MK). In their approach, rates of elementary steps in the fast diffusion limit are generated with the lattice MC simulations and then passed into an NN-based model that maps coverages with reaction rate for the entire reaction network. This model was then used in the mean-field microkinetic model. They demonstrated the capabilities of the model by studying CO oxidation, reaching accuracy comparable to kMC simulations^124,125.

Kinetic Monte-Carlo simulations

A more detailed approach to modeling kinetic processes is kMC, which enables capturing local fluctuations and spatial correlations. An early example of ML-based kMC is the approach developed by Sastry et al.¹²⁶. They used genetic programming to construct a symbolic regression for reaction barriers (given a small set of calculated barriers) to enable kMC simulations of the vacancy-assisted migration on an fcc Cu_xCo_1-x surface. Djurabekova et al. developed a simple NN coupled with kMC to efficiently predict vacancy migration energies, reducing computational costs and enabling the exploration of kinetic pathways during Cu precipitation in Fe¹²⁷. Castin et al.¹²⁸ developed a similar approach of employing a NN model to predict vacancy migration energies based on NEB data, allowing the acceleration of kMC simulations of thermal annealing in Fe-20%Cr alloys, and later for precipitation studies in Fe-based alloys^129,130 and point-defect transition rates in FeCu alloys¹³¹. Castin et al. further extended this method with high-dimensional NN MLIPs to predict vacancy migration barriers in kMC, tested on FeCu and FeCr alloys¹³².

Apart from using ML-methods to predict reaction rates and to accelerate kMC simulations, it has also been proposed to entirely replace kMC with an ML-based model. Chaffart et al. utilized NNs to predict coefficients of stochastic partial differential equations as a function of substrate temperature and surface precursor fraction. The authors then combined the resulting method with continuum transport equations to predict epitaxial thin film evolution and growth of a gaseous molecular precursor¹³³. Building on this approach, Kimaev et al. introduced a NN that can completely replace the stochastic multiscale model, which included coupling kMC with partial differential equations to simulate thin film formation by chemical vapor deposition^134,135,136. Independently, Ding et al. developed an integrated multiscale recurrent NN-based model for gas-phase transport profiles and microscopic surface dynamics using kMC and validated it for the plasma-enhanced atomic layer deposition of HfO₂ thin films¹³⁷. The improved efficiency of MLIPs has recently enabled direct modeling of some kinetic processes. For instance, Zhou et al. employed a GPR-based active learning tool for constructing MLIPs, named Flare¹⁰⁷, to study reconstruction kinetics at the PdAu(111) surface induced by CO¹³⁸. To efficiently learn the relation between reaction barriers and the output of the kMC model, Soleymanibrojeni et al. introduced an active learning process coupled with kMC to model solid-electrolyte interphase formation in Li-ion batteries, in which the initial dataset is constructed using a design-of-experiment approach, and a Gaussian process classification model¹³⁹.

Reaction networks

ML models have shown great potential in uncovering the complexity of reaction networks that limit the modeling of many processes. Ulissi et al. presented a workflow that enables predicting the complex reaction pathway, employing GPR, which was applied to the syngas conversion on Rh(111)¹⁴⁰. Another approach was shown by Liu and co-workers who employed stochastic surface walking to construct global NN-based PESs (SSW-NN)¹⁴¹ to study the water gas shift reaction on the Cu(111) surface. In the study, DFT data were used, containing 375,000 minima and more than 10,000 reaction pairs. The model enabled authors to find the lowest energy pathway for the entire reaction network¹⁴². This approach was later extended to an end-to-end framework for the activity prediction of heterogeneous catalytic systems (AI-Cat), in which two NN models are used, the first one for predicting possible reaction patterns and the second one for predicting the reaction barrier and energy. The NN models are employed in a Monte Carlo tree search to find the low-energy pathways of the reaction network. The authors applied AI-Cat to study the selectivity of glycerol hydrogenolysis on Cu surfaces¹⁴³.

Open challenges

Key challenges in kMC and microkinetic modeling are to ensure that the chemical reaction space (sites, elementary processes) adequately represents the system and that the underlying rate constants are valid for all relevant regimes. Rapid developments in data-driven approaches and global reaction exploration models are underway that will benefit the former^123,144. Robust and transferable ML surrogates that can rapidly predict rate constants as a function of various conditions will dramatically benefit the latter.

The propagation of kMC and microkinetic models featuring processes with vastly different time scales is a challenge that continues to be an active research area. Approaches that aim to replace kMC with machine learning ML-based models have shown promise. Additionally, sophisticated variable-coefficient differential equation solvers as well as robust uncertainty quantification and sensitivity analysis, will be crucial for advancing this field¹⁴⁵.

Kinetic models not only couple to first principles data via input parameters, they also provide information for mass transport modeling of macroscopic surface models. ML models will not only support parameterization but also uncertainty quantification across different scales.

Accurate dynamics at large time- and length scales

Simulating surface dynamics is challenging due to the high dimensionality, complex electronic structure of metallic surfaces, and the frequent requirement for ensemble averaging over thousands of molecular dynamics (MD) trajectories. Accurate modeling of dynamics at surfaces requires models that match the accuracy of first principles methods. However, employing ab initio MD for experimentally relevant systems is unfeasible in most cases, e.g., due to the long time scales and large system sizes required, leading to high computational costs of ab initio methods. This challenge has driven the development of MLIPs—ML interatomic potentials trained to first principles accuracy.

Interatomic potentials

Simulations of dynamics on surfaces require highly efficient surrogate models for energies, forces, and other properties. Fundamental to this are MLIPs. We note that many MLIPs are not solely dedicated to surface structure prediction and that their construction has recently been reviewed extensively in other areas of computational materials modeling^{7,8,146,147,148}. Most existing methods to construct MLIPs are based on NN-potentials or Bayesian regression methods. NN-potentials are unbiased and sufficiently flexible to learn from electronic structure data. They typically require in the range of 10⁴ training data points for high-dimensional PES with tens to hundreds of degrees of freedom. Inference, i.e., evaluation of MLIPs typically requires significantly greater computational effort compared to empirical force field potentials. Beos et al. highlighted the advantages and disadvantages of NNs over force fields by comparing a Behler-Parinello net to the ReaxFF¹⁴⁹ force field¹⁵⁰. While the NN performed significantly better for bulk properties than ReaxFF, the NN required significantly more (approximately ten times as much) training data compared to ReaxFF to obtain an accurate model for surface structures and adatom diffusion barriers. Bayesian regression methods are typically less data-hungry, but provide less flexibility and transferability^21,25,27.

Reactive gas-surface scattering simulations have long motivated the development of interatomic potentials. Early reactive PES construction methods include the corrugation reducing procedure¹⁵¹, modified Shepard interpolation^152,153, or interpolation with permutation invariant polynomials^154,155,156. These methods enable low computational costs of MD simulations, however, they require using a frozen (static) surface approximation, excluding the explicit treatment of surface atom motion. Most recent dynamical simulations employ MLIPs, which deliver ab initio accuracy and high efficiency while allowing the inclusion of all degrees of freedom in simulations. Therefore, MLIPs have come to dominate surface dynamics research. Early MLIPs for surface dynamics date back to 1995, when Blank et al. employed simple NNs to construct PESs for CO adsorbed on a Ni(111) surface and H₂ on a Si(100) surface, utilizing the latter for quantum transition state theory rate calculations¹⁵⁷. A highly successful MLIP based on Bayesian regression is GAP²⁵. GAP allows interpolating the PES by using GPR with a descriptor based on the local atomic density. GAP was originally applied to bulk crystals and was only later applied to problems in surface science^26,158,159. A milestone in the development of MLIPs was the high-dimensional, atom-centered NNs introduced by Behler and Parrinello. Their method calculates the total energy by summing atomic energy contributions, with interatomic interactions described within a defined cutoff radius¹⁶⁰. This atom-centered energy decomposition with finite cutoffs has since become a standard approach in most contemporary MLIPs.

$$E=\mathop{\sum }\limits_{i=1}^{{N}_{{\rm{at}}}}{E}_{i}=\mathop{\sum }\limits_{i=1}^{{N}_{{\rm{at}}}}N{N}_{i}$$

(1)

A key challenge underpinning interatomic potentials is the selection of good atom-centered structural descriptors (or features). A descriptor is a mathematical representation of the local atomic environment, encoding structural and chemical information to enable a smooth ML representation of energies, forces, and other properties as a function of the atomic positions and species. To be physically meaningful, a descriptor must remain invariant under the symmetry transformations of the system, such as rigid rotations, translations, and permutations. Correctly accounting for symmetries is crucial in MLIPs, especially for surfaces, which often exhibit multiple symmetries. Behler et al. were also among the first to apply NNs in surface science. They used general symmetry functions based on atomic Fourier terms, replacing the inconvenient empirical functions presented by previous authors to describe the interaction between O₂ and a frozen Al(111) surface¹⁶¹. Jiang, Guo, and colleagues introduced the permutation invariant polynomial NN (PIP-NN), which uses permutation invariant polynomials as inputs, preserving both molecular permutation symmetry and surface translational symmetry. They applied this approach to simulate H₂ on frozen Cu(111) and Ag(111) surfaces^162,163,164.

Methods based on high-dimensional NNs (HDNN) by Behler and Parrinello¹⁶⁰ allow the inclusion of surface degrees of freedom and form the most common type of MLIP applied for dynamics at surfaces. Guo and co-workers were the first to use HDNNs to investigate HCl scattering events on a dynamic Au(111) surface^165,166. Gerrits employed the HDNN code RuNNer¹⁰⁰ to model H₂ reactions on a curved Pt crystal. This PES model, trained on data covering multiple Pt surfaces, allowed for larger unit cells without significant accuracy loss, making it suitable for simulating realistic crystal systems¹⁶⁷. HDNNs were also used to study solid-liquid systems, in particular, for the dynamics of water at metal-based surfaces. Natarajan and Behler employed HDNN-based PESs, to investigate water-copper interfaces for low-index Cu surfaces¹⁶⁸ and stepped surfaces, including surface defects¹⁶⁹.

Jiang and co-workers developed the embedded atom NN (EANN), which was shown to provide highly efficient force evaluations when compared to previous HDNN-based approaches and enabled accurate model construction with fewer data points¹⁰¹. This method has been applied, for example, to simulate the scattering dynamics of NO on Ag(111)¹⁷⁰ or CO adsorption on Au(111)^171,172. Later, a piece-wise EANN (PEANN) was proposed, in which the original Gaussian-type orbitals used for descriptors are replaced with piece-wise switching functions leading to significant improvement in efficiency when applied to dissociative chemisorption of CH₄ on flat Ir(111) and stepped Ir(332)¹⁷³. A notable improvement to EANN was introduced in recursive EANN (REANN), in which a message-passing scheme was implemented for orbital coefficients, leading to increased accuracy and transferability of the models. REANN was applied e.g., to model SiO₂ bulk or dissociative chemisorption of CO₂ at Ni(100)^174,175.

Equivariant interatomic potentials

MLIPs based on deep message-passing (MPNN) have been gaining significant traction. These models not only learn a target property but also the input descriptor in an end-to-end fashion. This promises a reduction of bias compared to hand-crafted descriptors. One of the first deep atom-centered message-passing NN is the SchNet model^103,104. A recent milestone is the advent of equivariant MPNNs, where the output transforms consistently with the input transformations, allowing them to capture the complex geometric relationships between atoms. Batzner et al. developed neural equivariant interatomic potentials (NequiP), an equivariant MLIP based on MPNNs, that shows excellent accuracy in predicting energy and forces. It employs symmetry awareness by using E(3)-equivariant convolutions for interactions of geometric tensors. With this, the algorithm significantly reduces the amount of required data (up to 3 orders of magnitude) compared to other contemporary algorithms. The authors employed NequiP to study formate dehydrogenation on a Cu(110) surface¹⁰². Stark et al. explored the importance of equivariance by comparing invariant (SchNet¹⁰³) and equivariant (PaiNN¹¹⁴) MPNNs for H₂ dissociative adsorption on Cu surfaces, demonstrating that equivariant features enhance atomic environment descriptiveness, leading to more accurate models and smoother energy landscapes, while requiring fewer training data points¹⁷⁶.

Foundation models

Recently, several foundation models (also called universal MLIPs), including MACE-MP-0¹⁷⁷, ANI/EFP¹⁷⁸, M3GNet-DIRECT¹⁷⁹, ALIGNN-FF¹⁸⁰, and CHGNet¹⁸¹, have been introduced. These models are trained on a vast number of chemical species and data points with the ambition to provide a transferable prediction across a diverse range of systems. They have been successfully employed in modeling heterogeneous catalysis, or water/SiO₂ interface dynamics¹⁷⁷. Foundation models are pre-trained on large databases and often deliver qualitatively accurate predictions for tasks such as geometry optimization formation energy prediction¹⁸². Foundation models can suffer from softening (systematic underprediction of energies and forces), which can be overcome by retraining (or fine-tuning) on additional first-principles datasets (as few as 100 data points)^183,184. Conversely, retraining foundation models may reduce their universality and lead to decreased performance on certain systems, including those present in the refinement dataset¹⁸⁴. However, in many cases, compromising some degree of universality over accuracy for particular applications can be desirable. An example of such a case, for gas-surface dynamics (H₂ dissociation at Cu surfaces) was recently proposed by Radova et al. who introduced the MACE-freeze method¹⁸⁵, in which transfer learning with partially frozen model parameters is applied to the MACE-MP foundation model. The approach provided highly data-efficient MLIPs, in which similar accuracy to from-scratch-trained models is achieved with only 10-20% data points. The authors also showed that training a light, linear model (ACEpotentials¹⁸⁶) based on data generated with the MACE-freeze model as ground truth can lead not only to better efficiency (17 times faster force evaluation as compared to the “small” MACE-MP model) but also to improved accuracy relative to from-scratch-trained linear models.

Data generation

Training of MLIPs requires a large number of data points, the generation of which is computationally expensive, particularly due to the high computational cost associated with surface slab calculations. Adaptive sampling, also known as on-the-fly training, or active learning seeks to select the most informative data points to optimize the learning process with minimal data. Two strategies are particularly prevalent in surface science, uncertainty-based sampling and diversity sampling. In uncertainty-driven sampling, the model selects data points where its predictions are least confident, often focusing on points near decision boundaries or inherent uncertainties of a Bayesian model^27,187. Uncertainty-based sampling may involve balancing exploration and exploitation, as seen in Bayesian optimization methods. While leveraging uncertainties for exploration, they can also enhance exploitation e.g., by leveraging thermodynamic likelihood¹⁸⁸. Exploration can also be enhanced through additional techniques, e.g., by perturbing geometries¹⁸⁹. Another uncertainty-based method is query-by-committee¹⁹⁰, where multiple models probe data points, and new training points are selected where the models show the most disagreement. Query-by-committee was employed by Artrith and Behler in their development of NN potentials for copper surfaces¹⁹¹. Since then, its popularity has grown significantly, and we direct the interested reader to recent topical reviews^192,193. Diversity sampling follows a different strategy by ensuring that selected data points are spread across the input space, preventing redundancy and ensuring a broad representation of the dataset. When a large amount of ab initio MD data is available, trajectory subsampling techniques can also be employed to extract the most informative data points from numerous trajectories¹⁹⁴. Interestingly, a combination of diversity sampling (clustering) and uncertainty-based sampling can significantly improve learning rates of MLIPs compared to using exclusively one or the other method¹⁹⁵.

Inclusion of long-range effects

Long-range dispersion interactions often constitute one of the dominant interactions at hybrid organic/inorganic interfaces. For instance, the perylenetetracarboxylic dianhydride (PTCDA) molecule on Ag(111) is dominantly bonded by long-range interactions, despite also being anchored by peripheral oxygen atoms^15,28. These interactions are non-local by nature, and to describe them within an MLIP may require non-local descriptors with larger cut-offs than otherwise needed. This may render ML-models inefficient. Using SchNet, Westermayr et al. developed a long-range dispersion-inclusive MLIP that facilitates structure search and geometry optimization of organic/inorganic interfaces¹⁹⁶. They used two NNs, one to learn the vdW-free interaction energy and one to learn the Hirshfeld volume ratios, allowing the calculation of vdW interactions as a correction to the vdW-free NN. Around the same time, Piquemal¹⁹⁷ and Caro¹⁹⁸ introduced similar approaches. Other long-range separated MLIPs include the Long-Distance Equivariant (LODE) method^199,200, which employs local descriptors to represent Coulombic and other asymptotically decaying potentials around atoms, and the Latent Ewald Summation (LES)²⁰¹, designed specifically to address long-range interactions in atomistic systems. Long-range electrostatics, in addition to long-range dispersion, have been included in the HDNN models by Behler²⁰² and the SpookyNet model²⁰³.

Benchmarking potential accuracy

The accuracy of MLIPs is often assessed using standardized datasets such as MD17²⁰⁴, QM9²⁰⁵, OC20¹¹¹, and OC22²⁰⁶. The MD17 database contains MD trajectories for ten small organic molecules. The QM9 database contains relaxed geometries of 134,000 small, stable organic molecules composed of C, H, O, N, and F, for which geometric, energetic, electronic, and thermodynamic properties have been computed. Table 2 presents mean absolute errors for predicted energies. Almost all published MLIP models have been assessed on MD17 or QM9 datasets, which both cover only organic molecules and are not directly relevant to surface science problems. More applicable to surface science is the open catalyst (OC) project^111,206. The OC20 dataset comprises over 264 million data points, featuring relaxed and unrelaxed structures, adsorption energies, and atomic forces for various catalyst-adsorbate interactions. OC22 further extends OC20 with 9,8 million data points of complex reaction pathways and dynamic simulations, adding temporal data and focusing on reaction kinetics, making it a valuable resource for studying atomic-scale catalytic processes. Table 2 provides a comprehensive summary of literature benchmarks on these datasets (Supplementary Table 1 in the Supplementary information provides additional published benchmarks). In the case of the OC20/22 datasets we focus on two different tasks highly relevant to surface science: (A) The first benchmark targets the prediction of the total energy, as calculated by DFT, for a given structure – structure to energy and forces (S2EF). (B) The second benchmark targets the prediction of the relaxed DFT total energy for a given initial structure – initial structure to relaxed structure (IS2RS).

Table 2 MAE in meV for the prediction of energies (and geometry optimizations in case of OC20 IS2RE); Blue and red colors indicate small and large MAEs respectively; MLIPs trained and tested on MD17 (1000 training data points), QM9 (110,000 training data points), OC20 (460,328 training data points), and OC22 (8,225,293 training data points); MAEs on OC20/22 for out-of-domain set; MAEs are taken from^{102,112,175,206,253,254,255,256,257,258,259,260,261,262,263,264}; A more detailed table with MAEs can be found in the Supporting Information^{25,62,101,102,103,112,113,113,114,115,175,255,258,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279}

Full size table

Benchmarking prediction speed of potentials

Often the primary measure of the quality of MLIPs is the prediction accuracy, which is usually determined by the mean absolute error (MAE) and the root mean square error (RSME) on a test set. However, time-to-solution can be equally important in the context of the need for accurate statistical sampling of nonequilibrium gas-surface dynamics. Stark et al. benchmarked MLIPs such as ACE, MACE, REANN, and PaiNN²⁰⁷, evaluating accuracy via statistical ensemble averaging and comparing evaluation speeds on CPU and GPU architectures (Fig. 4). MACE and REANN models achieved good accuracy and efficiency within CPU architectures that are most suitable for massively parallel multi-trajectory dynamics. MACE provided a superior performance within GPU architectures. As a linear model, ACE is the fastest on CPU architectures, but providing training data to ensure consistent accuracy is more challenging than for MLIPs.

**Fig. 4: Comparison of MLIP accuracy and time-to-solution.**

Open challenges

What is evident from the overview in Table 2, is the existence of a benchmarking gap, as most interatomic potentials are commonly evaluated on molecular datasets rather than surface structures. Notably, various potentials, such as ACE, MACE, or REANN are routinely applied in surface dynamics but were not yet benchmarked on the OC20/22 datasets. Moreover, there exists an accuracy gap: For example, the OC20/22 datasets were determined using GGA(+U) level of theory^111,206, while the molecular database QM9 uses hybrid-DFT (B3LYP) level of theory²⁰⁵. Surfaces and interfaces are governed by mechanisms such as hybridization, charge transfer, Pauli repulsion, vdW interactions, level alignment, and surface-mediated electronic states¹⁵, often requiring beyond-GGA accuracy, as exemplified by the CO on metals puzzle^16,17. To further complicate things, relevant systems frequently comprise hundreds or thousands of atoms, limiting the ability to use a suitable level of theory. These challenges make generating the underlying data difficult and computationally expensive, and can introduce greater intrinsic uncertainty due to varying convergence thresholds.

Excited states and nonadiabatic dynamics

In most cases, the Born-Oppenheimer approximation is valid for describing the dynamics at surfaces; however, especially when considering dynamics at metals or low-bandgap semiconductor surfaces, electronic excitations and ensuing nonadiabatic effects cannot be ignored. Examples of approaches that can be used to simulate nonadiabatic electron-nuclear coupling effects in MD include Ehrenfest dynamics, a range of trajectory surface hopping methods^208,209, and MD with electronic friction (MDEF)²¹⁰. MDEF has been commonly used in the study of reactive scattering and light-driven dynamics of small molecules at metal surfaces. In this method, the dynamics are propagated on a single ground state PES. Atoms experience additional forces due to electronic friction and a random white noise term within a Langevin dynamics framework (see Fig. 5). Two approaches have been proposed to calculate the electronic friction tensor (EFT) from first principles: The local density friction approximation (LDFA)²¹¹ and first-order response theory based on DFT (otherwise known as orbital dependent friction or ODF)^212,213. In LDFA, the EFT is reduced to scalar friction values based on a bare surface electron density. ODF is calculated from Kohn-Sham DFT through time-dependent perturbation theory and provides a full EFT.

**Fig. 5: Exploration of nonadiabatic effects at metal surfaces using MDEF.**

Within a static surface approximation, LDFA can be efficiently evaluated from first principles, requiring only a single calculation of the surface electron density. Alducin, Juaristi, and co-workers published a series of studies on light-driven surface reactions, incorporating NN-based MLIPs in combination with LDFA friction, e.g., to study laser-driven desorption^214,215. However, especially at high temperatures, the inclusion of surface degrees of freedom may be crucial in simulating processes at surfaces, which can benefit from the use of flexible and efficient surrogate models of the EFT in addition to the MLIP that describes the PES. Alducin, Juaristi, and co-workers addressed this in later studies by employing a simplified density model based on exponentially decaying functions, that allows fast and accurate predictions of electronic friction within LDFA as a function of surface atom motion^216,217,218.

Utilizing ML techniques to represent the EFT is crucial within first-order perturbation theory (also called ODF) due to the high cost of linear response calculations in DFT. Spiering, Meyer, and co-workers introduced a symmetry adapted six-dimensional NN-based EFT model for H₂ and D₂ on Cu(111) and for N₂ on Ru(0001) surface^219,220. Zhang, Maurer, and co-workers created a NN-based EFT model that accounts for the covariance properties of the EFT with respect to the surface symmetry using a simple mapping scheme and used it to study the reactive scattering of H₂ on Ag(111) surface^221,222. The authors further improved the model by preserving the positive semidefiniteness, directional property, and correct symmetry-equivariance of EFT and tested it on the same example of H₂ dynamics at Ag(111) surface²²³. The model was also employed to study NO dynamics on Au(111)²²⁴. Recently, Sachs et al.²²⁵ introduced an Atomic-Cluster-Expansion-based EFT model (ACEfriction), utilizing equivariant representations of tensor-valued functions that satisfy all the symmetries of EFT by construction, allowing highly accurate and efficient prediction of friction and diffusion tensors. The construction of ACEfriction provides for size-transferability by enabling the prediction of EFTs of multiple adsorbates and larger friction tensors. The model was tested on NO/Au(111) system. It was also applied by Box et al. in the context of H atom scattering on Pt²²⁶.

Nonadiabatic effects are also explored with other techniques. Several ML models were constructed for effective surrogate Hamiltonians coupled most commonly with trajectory surface hopping dynamics methods^{227,228,229,230,231,232,233,234}. For example, Liu et al. combined an ML-based Hamiltonian surrogate and force fields with decoherence-induced surface hopping to study defects in MoS₂²³⁵. Meng et al. used ML for predicting excited states based on constrained DFT to construct an effective diabatic Hamiltonian, which was propagated with independent electron surface hopping dynamics. This was used to study nonadiabatic dynamics of CO scattering on Au(111)²³⁶. ML surrogate models accelerate and crucially enable quantum and mixed-quantum-classical dynamics in gas-surface dynamics, which makes this an exciting application area of ML in the coming years.

Open challenges

Most applications on excited state and nonadiabatic dynamics on surfaces have so far focussed on dynamics in the classical path, mean-field approximation, or molecular dynamics with electronic friction. These methods have uncontrolled and insufficiently assessed errors and are limited to situations of weak nonadiabaticity and strong adsorbate-substrate hybridization²³⁷. To enable the application of more robust mixed quantum-classical simulation methods to high-dimensional surface systems, ML electronic structure surrogates (as described in Large-scale electronic property and spectroscopy prediction) will play a crucial role. These models will need to be able to cope with sparse data of suboptimal accuracy. Accurate and scalable first principles reference calculations of excited-state properties of surfaces, for example, based on Many Body Perturbation Theory, are not routinely available for data generation. Equally important will be the development of improved dynamics methods that can efficiently capture nonadiabatic electron-phonon coupling, but also quantum decoherence and electron-electron scattering effects, which cannot be neglected in surface systems with high electronic DOS.

State of the field and future perspective

Data-driven and ML methods offer powerful tools to accelerate surface structure determination and energy and force prediction in surface dynamics simulations. ML approaches are used to accelerate geometry optimizations, enabling precise determinations of molecular adlayers and surface clusters in systems with many degrees of freedom. They also aid in determining energy barriers and transition states, with ML-accelerated NEB methods as a prominent example. ML is increasingly used to enhance or even replace kMC methods. Early approaches to improve the efficiency of surface dynamics relied on corrugation-reduction or the modified Shepard interpolation. Recently, machine-learned PES were used to attain cost-efficient energies and forces for MD simulations. Beyond the Born-Oppenheimer approximation, ML approaches are applied to learn electronic friction or excited-state surrogate Hamiltonians.

However, ML and data-driven methods in surface science often focus on simple or idealized systems, neglecting surface reconstructions, assuming low temperatures and ultra-high vacuum, focusing on single atoms or small molecules, or imposing commensurability. Highly relevant challenges, such as modeling incommensurate structures, kinetics and growth, high-pressure and high-humidity systems, solid-liquid interfaces with charged ions, electrochemical potentials and conditions, and light-matter interactions, push the limits of current approaches. Bridging this complexity gap requires efficient, massively scalable, and easily deployable ML tools trained on large, well-curated, and accurate datasets built on a synthesis between computational and experimental data. Hereafter, we will summarize the most important challenges to the application of ML methods in surface science.

Benchmarking gap

There is a pressing need for more comprehensive benchmarking of MLIPs for surface simulations. As highlighted in Table 2, recent advancements in dynamic model development are promising, but comprehensive benchmarking of widely used MLIPs on specific surface science datasets such as OC20/22, remains insufficient. Moreover, benchmarking on databases containing barriers (OC20NEB¹¹⁰) and thermodynamic properties remains largely unaddressed. Addressing this gap calls for a community-wide effort to test a broader range of models on existing surface science datasets while also creating more and more challenging datasets. These should extend beyond equilibrium geometries and transition state data, capturing the complexity of gas-surface dynamics and other non-equilibrium dynamical phenomena. Establishing a standardized benchmark dataset for gas-surface dynamics could provide a valuable foundation for the assessment of MLIPs.

Accuracy gap

Most current datasets in surface science are derived from DFT, predominantly at GGA level of theory. While GGA is widely used, it is often inadequate to accurately capture key surface properties, particularly energy barriers and adsorption energies. Datasets based on hybrid DFT or more advanced exchange-correlation energy descriptions and beyond-DFT methods remain scarce, and their reliability for surface science problems is not fully understood²³⁸. Even more critically, there is a significant lack of high-quality experimental data on structure, stability, and kinetics to guide or validate theoretical models⁵⁰. Relying solely on learning from GGA-DFT data will not fundamentally address the challenges in the electronic structure description of surfaces and interfaces.

Computational surface science data often contains more intrinsic noise and uncertainty compared to molecular data, which needs to be considered when applying ML methods. This is because surfaces are inherently more complex and require larger system sizes than molecules or bulk materials¹⁵. The difficulty in imposing tight convergence criteria results in the absence of a definitive “gold standard" for accurate first-principles surface computations. Depending on the system, different computational approaches may be suitable¹⁵. Exchange-correlation functionals differ widely in their predictive capabilities, and even for the same density functional approximation, different codes provide different results due to numerical approximations²³⁹.

Data gap

Many computational approaches in surface science are data-hungry, often requiring thousands of data points for accurate predictions. Computational data generation is costly due to the high demands of quantum mechanical calculations for large systems, particularly those with long-range interactions such as incommensurate structures or surface reconstructions. Experimental data generation often requires highly controlled environments and time-consuming sample preparation. Efforts to address this include efficient training algorithms such as optimal selection and active learning, which rely on estimators for information gain. A common estimator is the model uncertainty, which may, however, often fail to align with true errors, leading to overconfidence in poorly trained regions and a failure to generalize effectively. In highly regularized equivariant MPNNs, uncertainty estimation methods like bootstrapping or committees are less effective¹¹⁵. Addressing these shortcomings requires improvements in uncertainty quantification techniques, better validation protocols, transfer learning, and the incorporation of multi-fidelity approaches that combine high-accuracy data with less expensive, approximate calculations.

Transfer learning has gained increasing attention, particularly with the rise of foundation models (see Accurate dynamics at large time- and length scales). While research has primarily focused on MLIPs, there remains a significant gap in developing reliable methods to assess the transferability of ML models across systems. Ensuring transferability between small- and large-scale systems, or across different surfaces and compositions, is crucial. Additionally, generating validation data for large systems will be necessary to confirm this transferability. Little research has been conducted so far on transfer learning for the direct prediction of kinetic properties, barriers, or other properties beyond energies and forces.

The scarcity of high-fidelity computational and experimental data underscores the urgent need for data synthesis methods and multi-fidelity learning, such as multi-head NNs and transfer learning^185,240,241. These methods can synthesize information across different levels of theory–combining data from lower-accuracy methods (e.g., GGA-DFT) with more accurate beyond-DFT calculations, such as hybrid DFT, GW, or random phase approximation, and experimental observations. Data augmentation methods, where experimental data is supplemented with synthetically generated datasets, have shown promise for improving the analysis of scattering experiments²⁴². Integrated approaches will help to bridge the gaps between theoretical approximations and experiments, for instance enhancing the modeling of transition barriers, reaction rates, phase diagrams, and spectroscopy. They may also be able to optimize experiments and simulations by guiding resource allocation and improving data efficiency.

Open and FAIR data

Computational surface science generates vast amounts of data, but only a fraction is typically relevant to specific tasks. This data must be publicly available and well-curated, in compliance with the FAIR data principles²⁴³. Table 3 lists common databases that feature surface science data. Most databases are not exclusive to surface science and contain data from the wider field of computational chemistry/physics. Establishing large, consistent datasets is crucial for advancing ML in computational surface science. To foster reproducibility, knowledge transfer, and data reuse, we urge the community to make well-curated computational data publicly accessible.

Table 3 List of common databases for computational materials data as well as automated data generation tools

Full size table

Efficient and accurate inference

A notable challenge is the improvement of inference performance. Many current ML models are, while powerful and accurate, too computationally expensive for high-throughput predictions of properties or long-time-scale and large-system-size dynamics simulations in surface science. This is particularly important for nonequilibrium dynamics at surfaces that require ensemble averaging over tens of thousands of trajectories (e.g., to evaluate quantum-state-resolved reactive scattering probabilities). New developments in the accurate and efficient description of atomic environments and in MLIP architectures continue to improve accuracy and transferability, while reducing the computational cost of model evaluation, as shown in Fig. 4. GPU architectures and workflows can further reduce computational costs, however, the current bottleneck in deploying or renewing GPU compute architectures should also encourage us to continue to seek faster prediction on CPU architectures and models that benefit from just-in-time compilation.

Scalability and deployability are equally important. Efficiency goes beyond simply reducing inference time, it involves creating scalable and automated workflows that are adaptable to diverse and heterogeneous computing architectures. For complex simulations of nonequilibrium dynamics at surfaces—such as gas-surface interactions, atomic layer deposition, chemical vapor deposition, or rare event sampling—an effective ML approach must support high-throughput ensemble sampling. Achieving this requires solutions that balance computational loads across GPUs and CPUs efficiently. ML frameworks must support batched evaluations to maximize GPU utilization while simultaneously enabling parallel task farming for ensemble dynamics propagation. This dual focus ensures scalability in handling both individual trajectory evaluations and the broader orchestration of ensemble simulations. Furthermore, workflow integration is key. Effective ML deployments for surface science must seamlessly integrate data preprocessing, on-the-fly model evaluation, and post-processing while maintaining robustness and flexibility^207,244. Moreover, it is essential to integrate ML model predictions with downstream tasks, such as predicting electronic properties or Hamiltonians, as well as incorporating them into efficient bandstructure calculations²⁴⁵.

Machine learning and the future of surface characterization

Multi-technique characterization is a central paradigm in surface science, encompassing various spectroscopic methods, diffraction techniques, and imaging approaches such as scanning probe microscopy. Despite their importance, there has been limited progress in leveraging ML to bridge the gap between simulation and experiment in these techniques. This presents significant untapped potential for innovation and advancement as ML can use atomistic simulation data to complement highly integrated experimental measurement signals. Recent developments illustrate this promise, including the prediction of spectroscopic properties such as Raman and surface-enhanced Raman scattering measurements²⁴⁶, surface characterization using deep learning and infrared spectroscopy²⁴⁷, and classification in X-ray absorption and emission spectroscopy²⁴⁸. Efforts such as learning metastable phase diagrams²⁴⁹, reaction networks¹⁴⁰, and catalytic activity from experimental data¹²¹ pave the way to learning end-to-end transition rates from experimental data. These efforts demand extensive experimental databases supported by automated labs, similar to the recently proposed OCx24 database²⁵⁰. Finally, efforts such as the application of reinforcement learning to automate scanning probe experiments²⁵¹ will benefit from models that are aware of hidden atomic-scale conformations²⁵² for example, by being trained on complementary first-principles data. These examples underscore the vast opportunities for ML to revolutionize surface science.

Surface science, perhaps more than other fields, offers a unique opportunity to integrate highly controlled experiments aimed at uncovering fundamental physics with the power of ML and data-driven approaches. Its distinct challenges, combined with its significant industrial relevance, continue to drive innovation in computational methods, fostering advancements with transformative impacts that extend well beyond the field itself.

Data availability

No datasets were generated or analysed during the current study.

Abbreviations

CPU:: Central processing unit
DFT:: Density Functional Theory
DOS:: Density of states
EFT:: Electronic Friction Tensor
GA:: Genetic Algorithm
GGA:: Generalized Gradient Approximation
GPR:: Gaussian Process Regression
GPU:: Graphics processing unit
HDNN:: High-Dimensional Neural Network
kMC:: Kinetic Monte Carlo
LDFA:: Local Density Friction Approximation
MAE:: Mean Absolute Error
MD:: Molecular Dynamics
MDEF:: MD with Electronic Friction
ML:: Machine Learning
ML-NEB:: GPR-based NEB
MLIP:: Machine Learned Interatomic Potential
MPNN:: Deep Message-Passing Neural Network
NEB:: Nudged Elastic Band
NEXAFS:: Near edge X-ray absorption fine structure
NN:: Neural Network
ODF:: Orbital Dependent Friction
PES:: Potential Energy Surface
RMSE:: Root Mean Square Error
UPS:: Ultraviolet photoelectron spectroscopy
XPS:: X-ray photoelectron spectroscopy

References

Leung, K. DFT modelling of explicit solid–solid interfaces in batteries: methods and challenges. Phys. Chem. Chem. Phys. 22, 10412–10425 (2020).
Article CAS PubMed Google Scholar
Stamenkovic, V. R., Strmcnik, D., Lopes, P. P. & Markovic, N. M. Energy and fuels from electrochemical interfaces. Nat. Mater. 16, 57–69 (2017).
Article CAS Google Scholar
Baykara, M. Z., Vazirisereshk, M. R. & Martini, A. Emerging superlubricity: a review of the state of the art and perspectives on future research. Appl. Phys. Rev. 5, 041102 (2018).
Article Google Scholar
Somorjai, G. A. The surface science of heterogeneous catalysis. Surf. Sci. 299, 849–866 (1994).
Article Google Scholar
Maurice, V. & Marcus, P. Progress in corrosion science at atomic and nanometric scales. Prog. Mater. Sci. 95, 132–171 (2018).
Article CAS Google Scholar
Wieszczycka, K. et al. Surface functionalization–the way for advanced applications of smart materials. Coord. Chem. Rev. 436, 213846 (2021).
Article CAS Google Scholar
Deringer, V. L., Caro, M. A. & Csányi, G. Machine learning interatomic potentials as emerging tools for materials science. Adv. Mater. 31, 1902765 (2019).
Article CAS Google Scholar
Mishin, Y. Machine-learning interatomic potentials for materials science. Acta Mater. 214, 116980 (2021).
Article CAS Google Scholar
Kitchin, J. R. Machine learning in catalysis. Nat. Catal. 1, 230–232 (2018).
Article Google Scholar
Chen, B. W., Xu, L. & Mavrikakis, M. Computational methods in heterogeneous catalysis. Chem. Rev. 121, 1007–1048 (2020).
Article PubMed Google Scholar
Toyao, T. et al. Machine learning for catalysis informatics: recent applications and prospects. ACS Catal. 10, 2260–2297 (2019).
Article Google Scholar
Yang, W., Fidelis, T. T. & Sun, W.-H. Machine learning in catalysis, from proposal to practicing. ACS Omega 5, 83–88 (2019).
Article PubMed PubMed Central Google Scholar
Kalinin, S. V. et al. Automated and autonomous experiments in electron and scanning probe microscopy. ACS Nano 15, 12604–12627 (2021).
Article CAS PubMed Google Scholar
Gordon, O. M. & Moriarty, P. J. Machine learning at the (sub)atomic scale: next generation scanning probe microscopy. Mach. Learn. Sci. Technol. 1, 023001 (2020).
Article Google Scholar
Hofmann, O. T., Zojer, E., Hörmann, L., Jeindl, A. & Maurer, R. J. First-principles calculations of hybrid inorganic–organic interfaces: from state-of-the-art to best practice. Phys. Chem. Chem. Phys. 23, 8132–8180 (2021).
Article CAS PubMed PubMed Central Google Scholar
Gajdoš, M. & Hafner, J. CO adsorption on Cu(111) and Cu(001) surfaces: Improving site preference in DFT calculations. Surf. Sci. 590, 117–126 (2005).
Article Google Scholar
Feibelman, P. J. et al. The CO/Pt(111) puzzle. J. Phys. Chem. B 105, 4018–4025 (2001).
Article CAS Google Scholar
del Río, E. G., Mortensen, J. J. & Jacobsen, K. W. Local Bayesian optimizer for atomic structures. Phys. Rev. B 100, 104103 (2019).
Article Google Scholar
Garijo del Río, E., Kaappa, S., Garrido Torres, J. A., Bligaard, T. & Jacobsen, K. W. Machine learning with bond information for local structure optimizations in surface science. J. Chem. Phys. 153, 234116 (2020).
Article PubMed Google Scholar
Larsen, A. H. et al. The atomic simulation environment–a Python library for working with atoms. J. Phys. Condens. Matter 29, 273002 (2017).
Article Google Scholar
Hörmann, L., Jeindl, A., Egger, A. T., Scherbela, M. & Hofmann, O. T. Sample: Surface structure search enabled by coarse graining and statistical learning. Comput. Phys. Commun. 244, 143–155 (2019).
Article Google Scholar
Bisbo, M. K. & Hammer, B. Efficient global structure optimization with a machine-learned surrogate model. Phys. Rev. Lett. 124, 086102 (2020).
Article CAS PubMed Google Scholar
Kaappa, S., Del Río, E. G. & Jacobsen, K. W. Global optimization of atomic structures with gradient-enhanced Gaussian process regression. Phys. Rev. B 103, 174114 (2021).
Article CAS Google Scholar
Kaappa, S., Larsen, C. & Jacobsen, K. W. Atomic structure optimization with machine-learning enabled interpolation between chemical elements. Phys. Rev. Lett. 127, 166001 (2021).
Article CAS PubMed Google Scholar
Bartók, A. P., Payne, M. C., Kondor, R. & Csányi, G. Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. Phys. Rev. Lett. 104, 136403 (2010).
Article PubMed Google Scholar
Timmermann, J. et al. Iro 2 surface complexions identified through machine learning and surface investigations. Phys. Rev. Lett. 125, 206101 (2020).
Article CAS PubMed Google Scholar
Todorović, M., Gutmann, M. U., Corander, J. & Rinke, P. Bayesian inference of atomistic structure in functional materials. npj Comput. Mater. 5, 1–7 (2019).
Article Google Scholar
Hörmann, L., Jeindl, A. & Hofmann, O. T. Reproducibility of potential energy surfaces of organic/metal interfaces on the example of ptcda on Ag (111). J. Chem. Phys. 153, 104701 (2020).
Article PubMed Google Scholar
Scherbela, M., Hörmann, L., Jeindl, A., Obersteiner, V. & Hofmann, O. T. Charting the energy landscape of metal/organic interfaces via machine learning. Phys. Rev. Mater. 2, 043803 (2018).
Article CAS Google Scholar
Packwood, D. M., Han, P. & Hitosugi, T. Chemical and entropic control on the molecular self-assembly process. Nat. Commun. 8, 14463 (2017).
Article CAS PubMed PubMed Central Google Scholar
Li, X. et al. A transferable machine-learning scheme from pure metals to alloys for predicting adsorption energies. J. Mater. Chem. A 10, 872–880 (2022).
Article CAS Google Scholar
Chen, T. & Guestrin, C. Xgboost: A scalable tree boosting system. In Proc. of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794 (2016).
Abdi, J., Hadavimoghaddam, F., Hadipoor, M. & Hemmati-Sarapardeh, A. Modeling of CO₂ adsorption capacity by porous metal organic frameworks using advanced decision tree-based models. Sci. Rep. 11, 24468 (2021).
Article CAS PubMed PubMed Central Google Scholar
Suzuki, K. et al. Statistical analysis and discovery of heterogeneous catalysts based on machine learning from diverse published data. ChemCatChem 11, 4537–4547 (2019).
Article CAS Google Scholar
Glass, C. W., Oganov, A. R. & Hansen, N. Uspex-evolutionary crystal structure prediction. Comput. Phys. Commun. 175, 713–720 (2006).
Article CAS Google Scholar
Wen, Z.-H., Halpegamage, S., Gong, X.-Q. & Batzill, M. Fe(II) Ti(IV) O₃ mixed oxide monolayer at rutile TiO₂(011): Structures and reactivities. Surf. Sci. 653, 34–40 (2016).
Article CAS Google Scholar
Chuang, F., Ciobanu, C. V., Shenoy, V., Wang, C.-Z. & Ho, K.-M. Finding the reconstructions of semiconductor surfaces via a genetic algorithm. Surf. Sci. 573, L375–L381 (2004).
Article CAS Google Scholar
Vilhelmsen, L. B. & Hammer, B. Systematic study of Au 6 to Au 12 gold clusters on MgO (100) f centers using density-functional theory. Phys. Rev. Lett. 108, 126101 (2012).
Article PubMed Google Scholar
Vilhelmsen, L. B. & Hammer, B. A genetic algorithm for first principles global structure optimization of supported nano structures. J. Chem. Phys. 141, 044711 (2014).
Article PubMed Google Scholar
Jacobsen, T., Jørgensen, M. & Hammer, B. On-the-fly machine learning of atomic potential in density functional theory structure optimization. Phys. Rev. Lett. 120, 026102 (2018).
Article CAS PubMed Google Scholar
Oganov, A. R. & Valle, M. How to quantify energy landscapes of solids. J. Chem. Phys. 130, 104504 (2009).
Article PubMed Google Scholar
Wang, Y., Lv, J., Zhu, L. & Ma, Y. CALYPSO: a method for crystal structure prediction. Comput. Phys. Commun. 183, 2063–2070 (2012).
Article CAS Google Scholar
Lu, S., Wang, Y., Liu, H., Miao, M.-S. & Ma, Y. Self-assembled ultrathin nanotubes on diamond (100) surface. Nat. Commun. 5, 3666 (2014).
Article CAS PubMed Google Scholar
Gao, B. et al. Interface structure prediction via CALYPSO method. Sci. Bull. 64, 301–309 (2019).
Article CAS Google Scholar
Zhang, K. et al. A CeO2 (100) surface reconstruction unveiled by in situ stem and particle swarm optimization techniques. Sci. Adv. 10, eadn7904 (2024).
Article CAS PubMed PubMed Central Google Scholar
Meldgaard, S. A., Mortensen, H. L., Jørgensen, M. S. & Hammer, B. Structure prediction of surface reconstructions by deep reinforcement learning. J. Phys. Condens. Matter 32, 404005 (2020).
Article CAS Google Scholar
Jørgensen, M. S. et al. Atomistic structure learning. J. Chem. Phys. 151, 054111 (2019).
Article Google Scholar
Hu, W. et al. Forcenet: A graph neural network for large-scale quantum calculations. arXiv:2103.01436 (2021).
Hörmann, L., Jeindl, A. & Hofmann, O. T. From a bistable adsorbate to a switchable interface: tetrachloropyrazine on pt (111). Nanoscale 14, 5154–5162 (2022).
Article PubMed PubMed Central Google Scholar
Maurer, R. J. et al. Adsorption structures and energetics of molecules on metal surfaces: bridging experiment and theory. Prog. Surf. Sci. 91, 72–100 (2016).
Article CAS Google Scholar
Carbogno, C. et al. Numerical quality control for DFT-based materials databases. npj Comput. Mater. 8, 69 (2022).
Article Google Scholar
Zaverkin, V., Holzmüller, D., Bonfirraro, L. & Kästner, J. Transfer learning for chemically accurate interatomic neural network potentials. Phys. Chem. Chem. Phys. 25, 5383–5396 (2023).
Article CAS PubMed Google Scholar
Smith, J. S., Nebgen, B., Lubbers, N., Isayev, O. & Roitberg, A. E. Less is more: sampling chemical space with active learning. J. Chem. Phys. 148, 241733 (2018).
Article PubMed Google Scholar
Rowe, P., Deringer, V. L., Gasparotto, P., Csányi, G. & Michaelides, A. An accurate and transferable machine learning potential for carbon. J. Chem. Phys. 153, 034702 (2020).
Article CAS PubMed Google Scholar
Deringer, V. L., Caro, M. A. & Csányi, G. A general-purpose machine-learning force field for bulk and nanostructured phosphorus. Nat. Commun. 11, 5461 (2020).
Article CAS PubMed PubMed Central Google Scholar
Jeindl, A., Hörmann, L. & Hofmann, O. T. How much does surface polymorphism influence the work function of organic/metal interfaces? Appl. Surf. Sci. 575, 151687 (2022).
Article CAS Google Scholar
Rogal, J. & Reuter, K. Ab initio atomistic thermodynamics for surfaces: a primer. Exp. Model. Simul. Gas. Surf. Interact. React. Flow Hypersonic Flights 14, 2–1 (2007).
Google Scholar
Hørmann, L., Cartus, J. J. & Hofmann, O. T. Impact of static distortion waves on superlubricity. ACS Omega 8, 42457–42466 (2023).
Article PubMed PubMed Central Google Scholar
Ulissi, Z. W., Singh, A. R., Tsai, C. & Nørskov, J. K. Automated discovery and construction of surface phase diagrams using machine learning. J. Phys. Chem. Lett. 7, 3931–3935 (2016).
Article CAS PubMed Google Scholar
Moayedpour, S. et al. Structure prediction of epitaxial organic interfaces with ogre, demonstrated for tetracyanoquinodimethane (TCNQ) on tetrathiafulvalene (TTF). J. Phys. Chem. C. 127, 10398–10410 (2023).
Article CAS Google Scholar
Yang, S. et al. Ogre: a Python package for molecular crystal surface generation with applications to surface energy and crystal habit prediction. J. Chem. Phys. 152, 244122 (2020).
Article CAS PubMed Google Scholar
Smith, J. S., Isayev, O. & Roitberg, A. E. Ani-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 8, 3192–3203 (2017).
Article CAS PubMed PubMed Central Google Scholar
Grimme, S., Ehrlich, S. & Goerigk, L. Effect of the damping function in dispersion corrected density functional theory. J. Comput. Chem. 32, 1456–1465 (2011).
Article CAS PubMed Google Scholar
Huber, L., Hadian, R., Grabowski, B. & Neugebauer, J. A machine learning approach to model solute grain boundary segregation. npj Comput. Mater. 4, 64 (2018).
Article Google Scholar
Skilling, J. Nested sampling. Bayesian Inference Maximum Entropy Methods Sci. Eng. 735, 395–405 (2004).
Article Google Scholar
Yang, M., Pártay, L. B. & Wexler, R. B. Surface phase diagrams from nested sampling. Phys. Chem. Chem. Phys. 26, 13862–13874 (2024).
Article CAS PubMed Google Scholar
Riniker, S. Molecular dynamics fingerprints (MDFP): machine learning from MD data to predict free-energy differences. J. Chem. Inf. Model. 57, 726–741 (2017).
Article CAS PubMed Google Scholar
Manzoor, A. & Aidhy, D. S. Predicting vibrational entropy of fcc solids uniquely from bond chemistry using machine learning. Materialia 12, 100804 (2020).
Article CAS Google Scholar
Zeng, Y., Man, M., Bai, K. & Zhang, Y.-W. Revealing high-fidelity phase selection rules for high entropy alloys: a combined calphad and machine learning study. Mater. Des. 202, 109532 (2021).
Article CAS Google Scholar
Liu, F., Xiao, X., Huang, L., Tan, L. & Liu, Y. Design of nicocral eutectic high entropy alloys by combining machine learning with calphad method. Mater. Today Commun. 30, 103172 (2022).
Article CAS Google Scholar
Koczor-Benda, Z., Roelli, P., Galland, C. & Rosta, E. Molecular vibration explorer: an online database and toolbox for surface-enhanced frequency conversion and infrared and Raman spectroscopy. J. Phys. Chem. A 126, 4657–4663 (2022).
Article CAS PubMed PubMed Central Google Scholar
Talirz, L. et al. Materials Cloud, a platform for open computational science. Sci. Data 7, 299 (2020).
Article PubMed PubMed Central Google Scholar
Choudhary, K. & Garrity, K. F. Intermat: accelerating band offset prediction in semiconductor interfaces with DFT and deep learning. Digital Discov. 3, 1365–1377 (2024).
Article Google Scholar
Gerber, E. et al. High-throughput ab initio design of atomic interfaces using intermatch. Nat. Commun. 14, 7921 (2023).
Article CAS PubMed PubMed Central Google Scholar
Li, H., Collins, C., Tanha, M., Gordon, G. J. & Yaron, D. J. A density functional tight binding layer for deep learning of chemical Hamiltonians. J. Chem. Theory Comput. 14, 5764–5776 (2018).
Article CAS PubMed Google Scholar
Schattauer, C., Todorović, M., Ghosh, K., Rinke, P. & Libisch, F. Machine learning sparse tight-binding parameters for defects. npj Comput. Mater. 8, 1–11 (2022).
Article Google Scholar
Jorner, K., Pollice, R., Lavigne, C. & Aspuru-Guzik, A. Ultrafast computational screening of molecules with inverted singlet-triplet energy gaps using the Pariser-Parr-Pople semiempirical quantum chemistry method. J. Phys. Chem. A 128, 2445–2456 (2024).
Article CAS PubMed PubMed Central Google Scholar
Hegde, G. & Bowen, R. C. Machine-learned approximations to Density Functional Theory Hamiltonians. Sci. Rep. 7, 42669 (2017).
Article CAS PubMed PubMed Central Google Scholar
Schütt, K. T., Gastegger, M., Tkatchenko, A., Müller, K.-R. & Maurer, R. J. Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions. Nat. Commun. 10, 5024 (2019).
Article PubMed PubMed Central Google Scholar
Gastegger, M., McSloy, A., Luya, M., Schütt, K. T. & Maurer, R. J. A deep neural network for molecular wave functions in quasi-atomic minimal basis representation. J. Chem. Phys. 153, 044123 (2020).
Article CAS PubMed Google Scholar
Unke, O. et al. SE(3)-equivariant prediction of molecular wavefunctions and electronic densities. Adv. Neural Inf. Process. Syst. 34, 14434–14447 (2021).
Google Scholar
Westermayr, J. & Marquetand, P. Machine learning for electronically excited states of molecules. Chem. Rev. 121, 9873–9926 (2021).
Article CAS PubMed Google Scholar
Zhang, L. et al. Equivariant analytical mapping of first principles Hamiltonians to accurate and transferable materials models. npj Comput. Mater. 8, 158 (2022).
Article Google Scholar
Li, H. et al. Deep-learning density functional theory Hamiltonian for efficient ab initio electronic-structure calculation. Nat. Comput. Sci. 2, 367–377 (2022).
Article PubMed PubMed Central Google Scholar
Gong, X. et al. General framework for E(3)-equivariant neural network representation of density functional theory Hamiltonian. Nat. Commun. 14, 2848 (2023).
Article CAS PubMed PubMed Central Google Scholar
Lewis, A. M., Grisafi, A., Ceriotti, M. & Rossi, M. Learning electron densities in the condensed phase. J. Chem. Theory Comput. 17, 7203–7214 (2021).
Article CAS PubMed PubMed Central Google Scholar
Grisafi, A. & Salanne, M. Accelerating qm/mm simulations of electrochemical interfaces through machine learning of electronic charge densities. J. Chem. Phys. 161, 024109 (2024).
Article CAS PubMed Google Scholar
Chandrasekaran, A. et al. Solving the electronic structure problem with machine learning. npj Comput. Mater. 5, 22 (2019).
Article Google Scholar
Ben Mahmoud, C., Anelli, A., Csányi, G. & Ceriotti, M. Learning the electronic density of states in condensed matter. Phys. Rev. B 102, 235130 (2020).
Article Google Scholar
Kong, S. et al. Density of states prediction for materials discovery via contrastive learning from probabilistic embeddings. Nat. Commun. 13, 949 (2022).
Article CAS PubMed PubMed Central Google Scholar
Aarva, A., Deringer, V. L., Sainio, S., Laurila, T. & Caro, M. A. Understanding X-ray spectroscopy of carbonaceous materials by combining experiments, density functional theory, and machine learning. Part I: fingerprint spectra. Chem. Mater. 31, 9243–9255 (2019).
Article CAS Google Scholar
Golze, D. et al. Accurate computational prediction of core-electron binding energies in carbon-based materials: a machine-learning model combining density-functional theory and GW. Chem. Mater. 34, 6240–6254 (2022).
Article CAS PubMed PubMed Central Google Scholar
Rankine, C. D., Madkhali, M. M., Madkhali, M. M. & Penfold, T. J. A deep neural network for the rapid prediction of X-ray absorption spectra. J. Phys. Chem. A 124, 4263–4270 (2020).
Article CAS PubMed Google Scholar
Kotobi, A. et al. Integrating explainability into graph neural network models for the prediction of X-ray absorption spectra. J. Am. Chem. Soc. 145, 22584–22598 (2023).
Article CAS PubMed PubMed Central Google Scholar
Zarrouk, T., Ibragimova, R., Bartók, A. P. & Caro, M. A. Experiment-driven atomistic materials modeling: a case study combining X-ray photoelectron spectroscopy and machine learning potentials to infer the structure of oxygen-rich amorphous carbon. J. Am. Chem. Soc. 146, 14645–14659 (2024).
Article CAS PubMed PubMed Central Google Scholar
Rankine, C. D. & Penfold, T. J. Progress in the theory of X-ray spectroscopy: from quantum chemistry to machine learning and ultrafast dynamics. J. Phys. Chem. A 125, 4276–4293 (2021).
Article CAS PubMed Google Scholar
Meza Ramirez, C. A., Greenop, M., Ashton, L. & Rehman, I. U. Applications of machine learning in spectroscopy. Appl. Spectrosc. Rev. 56, 733–763 (2021).
Article Google Scholar
Zhang, D. et al. A brief review of new data analysis methods of laser-induced breakdown spectroscopy: machine learning. Appl. Spectrosc. Rev. 57, 89–111 (2022).
Article CAS Google Scholar
Westermayr, J., Gastegger, M., Schütt, K. T. & Maurer, R. J. Perspective on integrating machine learning into computational chemistry and materials science. J. Chem. Phys. 154, 230903 (2021).
Article CAS PubMed Google Scholar
Behler, J. First principles neural network potentials for reactive simulations of large molecular and condensed systems. Angew. Chem. Int. Ed. 56, 12828–12840 (2017).
Article CAS Google Scholar
Zhang, Y., Hu, C. & Jiang, B. Embedded atom neural network potentials: efficient and accurate machine learning with a physically inspired representation. J. Phys. Chem. Lett. 10, 4962–4967 (2019).
Article CAS PubMed Google Scholar
Batzner, S. et al. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 13, 1–11 (2022).
Article Google Scholar
Schütt, K. et al. Schnet: A continuous-filter convolutional neural network for modeling quantum interactions. Adv. Neural Inf. Process. Syst. 30 (2017).
Schütt, K. T., Sauceda, H. E., Kindermans, P.-J., Tkatchenko, A. & Müller, K.-R. Schnet–a deep learning architecture for molecules and materials. J. Chem. Phys. 148, 241722 (2018).
Article PubMed Google Scholar
Drautz, R. Atomic cluster expansion for accurate and transferable interatomic potentials. Phys. Rev. B 99, 014104 (2019).
Article CAS Google Scholar
Khorshidi, A. & Peterson, A. A. Amp: a modular approach to machine learning in atomistic simulations. Comput. Phys. Commun. 207, 310–324 (2016).
Article CAS Google Scholar
Vandermause, J. et al. On-the-fly active learning of interpretable Bayesian force fields for atomistic rare events. npj Comput. Mater. 6, 1–11 (2020).
Article Google Scholar
Peterson, A. A. Acceleration of saddle-point searches with machine learning. J. Chem. Phys. 145, 074106 (2016).
Article PubMed Google Scholar
Schaaf, L., Fako, E., De, S., Schäfer, A. & Csányi, G. Accurate reaction barriers for catalytic pathways: An automatic training protocol for machine learning force fields. npj Comput. Mater 9, 180 (2023).
Article Google Scholar
Wander, B., Shuaibi, M., Kitchin, J. R., Ulissi, Z. W. & Zitnick, C. L. CatTSunami: Accelerating transition state energy calculations with pre-trained graph neural networks. arXiv:2405.02078 (2024).
Chanussot, L. et al. Open catalyst 2020 (OC20) dataset and community challenges. ACS Catal. 11, 6059–6072 (2021).
Article CAS Google Scholar
Liao, Y.-L., Wood, B., Das, A. & Smidt, T. EquiformerV2: Improved equivariant transformer for scaling to higher-degree representations. arXiv:2306.12059 (2024).
Gasteiger, J., Becker, F. & Günnemann, S. Gemnet: Universal directional graph neural networks for molecules. Adv. Neural Inf. Process. Syst. 34, 6790–6802 (2021).
Google Scholar
Schütt, K., Unke, O. & Gastegger, M. et al. Equivariant message passing for the prediction of tensorial properties and molecular spectra. Proc. Mach. Learn. Res. 139, 9377–9388 (2021).
Google Scholar
Gasteiger, J., Giri, S., Margraf, J. T. & Günnemann, S. Fast and uncertainty-aware directional message passing for non-equilibrium molecules. arXiv:2011.14115 (2022).
Koistinen, O.-P., Maras, E., Vehtari, A. & Jónsson, H. Minimum energy path calculations with Gaussian process regression. Nanosystems: Phys. Chem. Math. 925–935 (2016).
Koistinen, O.-P., Dagbjartsdóttir, F. B., Ásgeirsson, V., Vehtari, A. & Jónsson, H. Nudged elastic band calculations accelerated with Gaussian process regression. J. Chem. Phys. 147, 152720 (2017).
Article PubMed Google Scholar
Garrido Torres, J. A., Jennings, P. C., Hansen, M. H., Boes, J. R. & Bligaard, T. Low-scaling algorithm for nudged elastic band calculations using a surrogate machine learning model. Phys. Rev. Lett. 122, 156001 (2019).
Article CAS PubMed Google Scholar
Singh, A. R., Rohr, B. A., Gauthier, J. A. & Nørskov, J. K. Predicting chemical reaction barriers with a machine learning model. Catal. Lett. 149, 2347–2354 (2019).
Article CAS Google Scholar
Komp, E. & Valleau, S. Machine learning quantum reaction rate constants. J. Phys. Chem. A 124, 8607–8613 (2020).
Article CAS PubMed Google Scholar
Smith, A., Keane, A., Dumesic, J. A., Huber, G. W. & Zavala, V. M. A machine learning framework for the analysis and prediction of catalytic activity from experimental data. Appl. Catal. B Environ. 263, 118257 (2020).
Article CAS Google Scholar
Pineda, M. & Stamatakis, M. Kinetic Monte Carlo simulations for heterogeneous catalysis: fundamentals, current status, and challenges. J. Chem. Phys. 156, 120902 (2022).
Article CAS PubMed Google Scholar
Goldsmith, C. F. & West, R. H. Automatic generation of microkinetic mechanisms for heterogeneous catalysis. J. Phys. Chem. C. 121, 9970–9981 (2017).
Article CAS Google Scholar
Tian, H. & Rangarajan, S. Machine-learned corrections to mean-field microkinetic models at the fast diffusion limit. J. Phys. Chem. C. 125, 20275–20285 (2021).
Article CAS Google Scholar
Rangarajan, S. & Tian, H. Improving the predictive power of microkinetic models via machine learning. Curr. Opin. Chem. Eng. 38, 100858 (2022).
Article Google Scholar
Sastry, K., Johnson, D. D., Goldberg, D. E. & Bellon, P. Genetic programming for multitimescale modeling. Phys. Rev. B 72, 085438 (2005).
Article Google Scholar
Djurabekova, F. et al. Artificial intelligence applied to atomistic kinetic Monte Carlo simulations in Fe-Cu alloys. Nucl. Instrum. Methods Phys. Res. B Beam Interact. Mater. 255, 8–12 (2007).
Article CAS Google Scholar
Castin, N. & Malerba, L. Calculation of proper energy barriers for atomistic kinetic Monte Carlo simulations on rigid lattice with chemical and strain field long-range effects using artificial neural networks. J. Chem. Phys. 132, 074507 (2010).
Article CAS PubMed Google Scholar
Castin, N., Pascuet, M. I. & Malerba, L. Modeling the first stages of Cu precipitation in α-Fe using a hybrid atomistic kinetic Monte Carlo approach. J. Chem. Phys. 135, 064502 (2011).
Article CAS PubMed Google Scholar
Castin, N., Pascuet, M. & Malerba, L. Mobility and stability of large vacancy and vacancy-copper clusters in iron: an atomistic kinetic Monte Carlo study. J. Nucl. Mater. 429, 315–324 (2012).
Article CAS Google Scholar
Messina, L., Castin, N., Domain, C. & Olsson, P. Introducing ab initio based neural networks for transition-rate prediction in kinetic Monte Carlo simulations. Phys. Rev. B 95, 064112 (2017).
Article Google Scholar
Castin, N., Messina, L., Domain, C., Pasianot, R. C. & Olsson, P. Improved atomistic Monte Carlo models based on ab-initio -trained neural networks: Application to FeCu and FeCr alloys. Phys. Rev. B 95, 214117 (2017).
Article Google Scholar
Chaffart, D. & Ricardez-Sandoval, L. A. Optimization and control of a thin film growth process: A hybrid first principles/artificial neural network based multiscale modelling approach. Comput. Chem. Eng. 119, 465–479 (2018).
Article CAS Google Scholar
Kimaev, G. & Ricardez-Sandoval, L. A. Nonlinear model predictive control of a multiscale thin film deposition process using artificial neural networks. Chem. Eng. Sci. 207, 1230–1245 (2019).
Article CAS Google Scholar
Kimaev, G., Chaffart, D. & Ricardez-Sandoval, L. A. Multilevel Monte Carlo applied for uncertainty quantification in stochastic multiscale systems. AIChE J. 66, e16262 (2020).
Article CAS Google Scholar
Kimaev, G. & Ricardez-Sandoval, L. A. Artificial neural network discrimination for parameter estimation and optimal product design of thin films manufactured by chemical vapor deposition. J. Phys. Chem. C. 124, 18615–18627 (2020).
Article CAS Google Scholar
Ding, Y., Zhang, Y., Chung, H. Y. & Christofides, P. D. Machine learning-based modeling and operation of plasma-enhanced atomic layer deposition of hafnium oxide thin films. Comput. Chem. Eng. 144, 107148 (2021).
Article CAS Google Scholar
Zhou, C. et al. Dynamical study of adsorbate-induced restructuring kinetics in bimetallic catalysts using the PdAu(111) model system. J. Am. Chem. Soc. 144, 15132–15142 (2022).
Article CAS PubMed Google Scholar
Soleymanibrojeni, M., Caldeira Rego, C. R., Esmaeilpour, M. & Wenzel, W. An active learning approach to model solid-electrolyte interphase formation in Li-ion batteries. J. Mater. Chem. A 12, 2249–2266 (2023).
Article Google Scholar
Ulissi, Z. W., Medford, A. J., Bligaard, T. & Nørskov, J. K. To address surface reaction network complexity using scaling relations machine learning and DFT calculations. Nat. Commun. 8, 14621 (2017).
Article PubMed PubMed Central Google Scholar
Huang, S.-D., Shang, C., Zhang, X.-J. & Liu, Z.-P. Material discovery by combining stochastic surface walking global optimization with a neural network. Chem. Sci. 8, 6327–6337 (2017).
Article CAS PubMed PubMed Central Google Scholar
Kang, P.-L. & Liu, Z.-P. Reaction prediction via atomistic simulation: from quantum mechanics to machine learning. iScience 24, 102013 (2021).
Article PubMed Google Scholar
Kang, P.-L., Shi, Y.-F., Shang, C. & Liu, Z.-P. Artificial intelligence pathway search to resolve catalytic glycerol hydrogenolysis selectivity. Chem. Sci. 13, 8148–8160 (2022).
Article CAS PubMed PubMed Central Google Scholar
Ismail, I., Chantreau Majerus, R. & Habershon, S. Graph-driven reaction discovery: progress, challenges, and future opportunities. J. Phys. Chem. A 126, 7051–7069 (2022).
Article CAS PubMed PubMed Central Google Scholar
Gilkes, J., Storr, M. T., Maurer, R. J. & Habershon, S. Predicting long-time-scale kinetics under variable experimental conditions with Kinetica.jl. J. Chem. Theory Comput. 20, 5196–5214 (2024).
Article CAS PubMed PubMed Central Google Scholar
Mueller, T., Hernandez, A. & Wang, C. Machine learning for interatomic potential models. J. Chem. Phys. 152, 050902 (2020).
Article CAS PubMed Google Scholar
Fedik, N. et al. Extending machine learning beyond interatomic potentials for predicting molecular properties. Nat. Rev. Chem. 6, 653–672 (2022).
Article CAS PubMed Google Scholar
Zubatiuk, T. & Isayev, O. Development of multimodal machine learning potentials: toward a physics-aware artificial intelligence. Acc. Chem. Res. 54, 1575–1585 (2021).
Article CAS PubMed Google Scholar
Van Duin, A. C., Dasgupta, S., Lorant, F. & Goddard, W. A. Reaxff: a reactive force field for hydrocarbons. J. Phys. Chem. A 105, 9396–9409 (2001).
Article Google Scholar
Boes, J. R., Groenenboom, M. C., Keith, J. A. & Kitchin, J. R. Neural network and reaxff comparison for au properties. Int. J. Quantum Chem. 116, 979–987 (2016).
Article CAS Google Scholar
Busnengo, H. F., Di Césare, M. A., Dong, W. & Salin, A. Surface temperature effects in dynamic trapping mediated adsorption of light molecules on metal surfaces: H₂ on Pd(111) and Pd(110). Phys. Rev. B 72, 125411 (2005).
Article Google Scholar
Ischtwan, J. & Collins, M. A. Molecular potential energy surfaces by interpolation. J. Chem. Phys. 100, 8080–8088 (1994).
Article CAS Google Scholar
Thompson, K. C. & Collins, M. A. Molecular potential-energy surfaces by interpolation: Further refinements. J. Chem. Soc., Faraday Trans. 93, 871–878 (1997).
Article CAS Google Scholar
Braams, B. J. & Bowman, J. M. Permutationally invariant potential energy surfaces in high dimensionality. Int. Rev. Phys. Chem. 28, 577–606 (2009).
Article CAS Google Scholar
Bowman, J. M., Czakó, G. & Fu, B. High-dimensional ab initio potential energy surfaces for reaction dynamics calculations. Phys. Chem. Chem. Phys. 13, 8094 (2011).
Article CAS PubMed Google Scholar
Jiang, B., Ren, X., Xie, D. & Guo, H. Enhancing dissociative chemisorption of H₂O on Cu(111) via vibrational excitation. Proc. Natl. Acad. Sci. USA 109, 10224–10227 (2012).
Article CAS PubMed PubMed Central Google Scholar
Blank, T. B., Brown, S. D., Calhoun, A. W. & Doren, D. J. Neural network models of potential energy surfaces. J. Chem. Phys. 103, 4129–4137 (1995).
Article CAS Google Scholar
Timmermann, J. et al. Data-efficient iterative training of Gaussian approximation potentials: application to surface structure determination of rutile IrO₂ and RuO₂. J. Chem. Phys. 155, 244107 (2021).
Article CAS PubMed Google Scholar
Kloppenburg, J., Bartók-Partay, L., Jónsson, H. & Caro, M. A. A general-purpose machine learning pt interatomic potential for an accurate description of bulk, surfaces and nanoparticles. J. Chem. Phys. 158, 134704 (2023).
Article CAS PubMed Google Scholar
Behler, J. & Parrinello, M. Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. 98, 146401 (2007).
Article PubMed Google Scholar
Behler, J., Lorenz, S. & Reuter, K. Representing molecule-surface interactions with symmetry-adapted neural networks. J. Chem. Phys. 127, 014705 (2007).
Article PubMed Google Scholar
Jiang, B. & Guo, H. Permutation invariant polynomial neural network approach to fitting potential energy surfaces. J. Chem. Phys. 139, 054112 (2013).
Article PubMed Google Scholar
Jiang, B. & Guo, H. Permutation invariant polynomial neural network approach to fitting potential energy surfaces. III. Molecule-surface interactions. J. Chem. Phys. 141, 034109 (2014).
Article PubMed Google Scholar
Jiang, B. & Guo, H. Six-dimensional quantum dynamics for dissociative chemisorption of H ₂ and D ₂ on Ag(111) on a permutation invariant potential energy surface. Phys. Chem. Chem. Phys. 16, 24704–24715 (2014).
Article CAS PubMed Google Scholar
Kolb, B., Luo, X., Zhou, X., Jiang, B. & Guo, H. High-Dimensional Atomistic Neural Network Potentials for Molecule-Surface Interactions: HCl Scattering from Au(111). J. Phys. Chem. Lett. 8, 666–672 (2017).
Article CAS PubMed Google Scholar
Liu, Q. et al. Constructing high-dimensional neural network potential energy surfaces for gas-surface scattering and reactions. J. Phys. Chem. C. 122, 1761–1769 (2018).
Article CAS Google Scholar
Gerrits, N. Accurate simulations of the reaction of H₂ on a curved Pt crystal through machine learning. J. Phys. Chem. Lett. 12, 12157–12164 (2021).
Article CAS PubMed PubMed Central Google Scholar
Natarajan, S. K. & Behler, J. Neural network molecular dynamics simulations of solid-liquid interfaces: water at low-index copper surfaces. Phys. Chem. Chem. Phys. 18, 28704–28725 (2016).
Article CAS PubMed Google Scholar
Natarajan, S. K. & Behler, J. Self-diffusion of surface defects at copper-water interfaces. J. Phys. Chem. C. 121, 4368–4383 (2017).
Article Google Scholar
Wang, Y., Yin, R. & Jiang, B. Rotationally inelastic scattering dynamics of NO from ag(111): Influence of interaction potentials. J. Phys. Chem. C. 127, 11966–11977 (2023).
Article CAS Google Scholar
Hu, C., Lin, Q., Guo, H. & Jiang, B. Influence of supercell size on Gas-Surface Scattering: a case study of CO scattering from Au(111). Chem. Phys. 554, 111423 (2022).
Article CAS Google Scholar
Meng, G., Hu, C. & Jiang, B. Vibrational relaxation of highly vibrationally excited molecules scattered from Au (111): role of the dissociation barrier. J. Phys. Chem. C. 126, 12003–12008 (2022).
Article CAS Google Scholar
Zhang, Y., Hu, C. & Jiang, B. Accelerating atomistic simulations with piecewise machine-learned ab Initio potentials at a classical force field-like cost. Phys. Chem. Chem. Phys. 23, 1815–1821 (2021).
Article CAS PubMed Google Scholar
Zhang, Y., Xia, J. & Jiang, B. Physically motivated recursively embedded atom neural networks: incorporating local completeness and nonlocality. Phys. Rev. Lett. 127, 156002 (2021).
Article CAS PubMed Google Scholar
Zhang, Y., Xia, J. & Jiang, B. REANN: a PyTorch-based end-to-end multi-functional deep neural network package for molecular, reactive, and periodic systems. J. Chem. Phys. 156, 114801 (2022).
Article CAS PubMed Google Scholar
Stark, W. G. et al. Machine Learning Interatomic Potentials for Reactive Hydrogen Dynamics at Metal Surfaces Based on Iterative Refinement of Reaction Probabilities. J. Phys. Chem. C. 127, 24168–24182 (2023).
Article CAS Google Scholar
Batatia, I. et al. A foundation model for atomistic materials chemistry. arXiv:2401.00096 (2023).
Haghiri, S., Viquez Rojas, C., Bhat, S., Isayev, O. & Slipchenko, L. ANI/EFP: Modeling long-range interactions in ani neural network with effective fragment potentials. J. Chem. Theory Comput. 20, 9138 (2024).
Article CAS PubMed Google Scholar
Qi, J., Ko, T. W., Wood, B. C., Pham, T. A. & Ong, S. P. Robust training of machine learning interatomic potentials with dimensionality reduction and stratified sampling. npj Comput. Mater. 10, 43 (2024).
Article Google Scholar
Choudhary, K. et al. Unified graph neural network force-field for the periodic table: solid state applications. Digit. Discov. 2, 346–355 (2023).
Article CAS Google Scholar
Deng, B. et al. Chgnet as a pretrained universal neural network potential for charge-informed atomistic modelling. Nat. Mach. Intell. 5, 1031–1041 (2023).
Article Google Scholar
Yu, H., Giantomassi, M., Materzanini, G., Wang, J. & Rignanese, G.-M. Systematic assessment of various universal machine-learning interatomic potentials. Mater. Genome Eng. Adv. 2, e58 (2024).
Article CAS Google Scholar
Deng, B. et al. Overcoming systematic softening in universal machine learning interatomic potentials by fine-tuning. arXiv:2405.07105 (2024).
Focassio, B., M. Freitas, L. P. & Schleder, G. R. Performance assessment of universal machine learning interatomic potentials: challenges and directions for materials’ surfaces. Mater. Interfaces 17, 13111–13121 (2025).
Article CAS Google Scholar
Radova, M., Stark, W. G., Allen, C. S., Maurer, R. J. & Bartók, A. P. Fine-tuning foundation models of materials interatomic potentials with frozen transfer learning. arXiv:2502.15582 (2025).
Witt, W. C. et al. ACEpotentials.jl: A Julia implementation of the atomic cluster expansion. J. Chem. Phys. 159, 164101 (2023).
Article CAS PubMed Google Scholar
Xie, Y. et al. Uncertainty-aware molecular dynamics from Bayesian active learning for phase transformations and thermal transport in SiC. npj Comput. Mater. 9, 36 (2023).
Article CAS Google Scholar
Schwalbe-Koda, D., Tan, A. R. & Gómez-Bombarelli, R. Differentiable sampling of molecular geometries with uncertainty-based adversarial attacks. Nat. Commun. 12, 5104 (2021).
Article CAS PubMed PubMed Central Google Scholar
Van Der Oord, C., Sachs, M., Kovács, D. P., Ortner, C. & Csányi, G. Hyperactive learning for data-driven interatomic potentials. npj Comput. Mater. 9, 168 (2023).
Article PubMed PubMed Central Google Scholar
Seung, H. S., Opper, M. & Sompolinsky, H. Query by committee. Proc. Fifth Annu. Workshop Comput. Learn. Theory 287–294 (1992).
Artrith, N. & Behler, J. High-dimensional neural network potentials for metal surfaces: a prototype study for copper. Phys. Rev. B 85, 045439 (2012).
Article Google Scholar
Kocer, E., Ko, T. W. & Behler, J. Neural network potentials: A concise overview of methods. Annu. Rev. Phys. Chem. 73, 163–186 (2022).
Article CAS PubMed Google Scholar
Tokita, A. M. & Behler, J. How to train a neural network potential. J. Chem. Phys. 159, 121501 (2023).
Article CAS PubMed Google Scholar
Zhang, Y., Zhou, X. & Jiang, B. Bridging the gap between direct dynamics and globally accurate reactive potential energy surfaces using neural networks. J. Phys. Chem. Lett. 10, 1185–1191 (2019).
Article CAS PubMed Google Scholar
Ghosh, K., Todorović, M., Vehtari, A. & Rinke, P. Active learning of molecular data for task-specific objectives. J. Chem. Phys. 162, 014103 (2025).
Article CAS PubMed Google Scholar
Westermayr, J., Chaudhuri, S., Jeindl, A., Hofmann, O. T. & Maurer, R. J. Long-range dispersion-inclusive machine learning potentials for structure search and optimization of hybrid organic–inorganic interfaces. Digit. Discov. 1, 463–475 (2022).
Article CAS PubMed PubMed Central Google Scholar
Poier, P. P., Jaffrelot Inizan, T., Adjoua, O., Lagardere, L. & Piquemal, J.-P. Accurate deep learning-aided density-free strategy for many-body dispersion-corrected density functional theory. J. Phys. Chem. Lett. 13, 4381–4388 (2022).
Article CAS PubMed Google Scholar
Muhli, H. et al. Machine learning force fields based on local parametrization of dispersion interactions: Application to the phase diagram of c 60. Phys. Rev. B 104, 054106 (2021).
Article CAS Google Scholar
Cheng, B., Engel, E. A., Behler, J., Dellago, C. & Ceriotti, M. Ab initio thermodynamics of liquid and solid water. Proc. Natl Acad. Sci. 116, 1110–1115 (2019).
Article CAS PubMed PubMed Central Google Scholar
Huguenin-Dumittan, K. K., Loche, P., Haoran, N. & Ceriotti, M. Physics-inspired equivariant descriptors of nonbonded interactions. J. Phys. Chem. Lett. 14, 9612–9618 (2023).
Article CAS PubMed PubMed Central Google Scholar
Cheng, B. Latent Ewald summation for machine learning of long-range interactions. npj Comput. Mater. 11, 80 (2025).
Ko, T. W., Finkler, J. A., Goedecker, S. & Behler, J. A fourth-generation high-dimensional neural network potential with accurate electrostatics including non-local charge transfer. Nat. Commun. 12, 398 (2021).
Article CAS PubMed PubMed Central Google Scholar
Unke, O. T. et al. SpookyNet: Learning force fields with electronic degrees of freedom and nonlocal effects. Nat. Commun. 12, 7273 (2021).
Article CAS PubMed PubMed Central Google Scholar
Chmiela, S. et al. Machine learning of accurate energy-conserving molecular force fields. Sci. Adv. 3, e1603015 (2017).
Article PubMed PubMed Central Google Scholar
Ramakrishnan, R., Dral, P. O., Rupp, M. & Von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1, 1–7 (2014).
Article Google Scholar
Tran, R. et al. The open catalyst 2022 (OC22) dataset and challenges for oxide electrocatalysts. ACS Catal. 13, 3066–3084 (2023).
Article CAS Google Scholar
Stark, W. G. et al. Benchmarking of machine learning interatomic potentials for reactive hydrogen dynamics at metal surfaces. Mach. Learn. Sci. Technol. 5, 030501 (2024).
Article Google Scholar
Tully, J. C. Molecular dynamics with electronic transitions. J. Chem. Phys. 93, 1061–1071 (1990).
Article CAS Google Scholar
Shenvi, N., Roy, S. & Tully, J. C. Nonadiabatic dynamics at metal surfaces: independent-electron surface hopping. J. Chem. Phys. 130, 174107 (2009).
Article PubMed Google Scholar
Head-Gordon, M. & Tully, J. C. Molecular dynamics with electronic frictions. J. Chem. Phys. 103, 10137–10145 (1995).
Article CAS Google Scholar
Juaristi, J. I., Alducin, M., Muiño, R. D., Busnengo, H. F. & Salin, A. Role of electron-hole pair excitations in the dissociative adsorption of diatomic molecules on metal surfaces. Phys. Rev. Lett. 100, 116102 (2008).
Article CAS PubMed Google Scholar
Askerka, M., Maurer, R. J., Batista, V. S. & Tully, J. C. Role of tensorial electronic friction in energy transfer at metal surfaces. Phys. Rev. Lett. 116, 217601 (2016).
Article PubMed Google Scholar
Maurer, R. J., Askerka, M., Batista, V. S. & Tully, J. C. Ab initio tensorial electronic friction for molecules on metal surfaces: Nonadiabatic vibrational relaxation. Phys. Rev. B 94, 115432 (2016).
Article Google Scholar
Lindner, S. et al. Femtosecond laser-induced desorption of hydrogen molecules from Ru(0001): a systematic study based on machine-learned potentials. J. Phys. Chem. C. 127, 14756–14764 (2023).
Article CAS Google Scholar
Muzas, A. S. et al. Multicoverage study of femtosecond laser-induced desorption of CO from Pd(111). J. Phys. Chem. Lett. 15, 2587–2594 (2024).
Article Google Scholar
Serrano Jiménez, A. et al. Photoinduced desorption dynamics of CO from Pd(111): a neural network approach. J. Chem. Theory Comput. 17, 4648–4659 (2021).
Article PubMed PubMed Central Google Scholar
Muzas, A. et al. Absence of isotope effects in the photo-induced desorption of CO from saturated Pd(111) at high laser fluence. Chem. Phys. 558, 111518 (2022).
Article CAS Google Scholar
Žugec, I. et al. Understanding the photoinduced desorption and oxidation of CO on Ru(0001) using a neural network potential energy surface. JACS Au 4, 1997–2004 (2024).
Article PubMed PubMed Central Google Scholar
Spiering, P. & Meyer, J. Testing electronic friction models: vibrational de-excitation in scattering of H₂ and D₂ from Cu(111). J. Phys. Chem. Lett. 9, 1803–1808 (2018).
Article CAS PubMed PubMed Central Google Scholar
Spiering, P., Shakouri, K., Behler, J., Kroes, G.-J. & Meyer, J. Orbital-dependent electronic friction significantly affects the description of reactive scattering of N₂ from Ru(0001). J. Phys. Chem. Lett. 10, 2957–2962 (2019).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Y., J. Maurer, R., Guo, H. & Jiang, B. Hot-electron effects during reactive scattering of H₂ from Ag(111): the interplay between mode-specific electronic friction and the potential energy landscape. Chem. Sci. 10, 1089–1097 (2019).
Article CAS PubMed Google Scholar
Maurer, R. J., Zhang, Y., Guo, H. & Jiang, B. Hot electron effects during reactive scattering of H₂ from Ag(111): assessing the sensitivity to initial conditions, coupling magnitude, and electronic temperature. Faraday Discuss. 214, 105–121 (2019).
Article CAS PubMed Google Scholar
Zhang, Y., Maurer, R. J. & Jiang, B. Symmetry-adapted high dimensional neural network representation of electronic friction tensor of adsorbates on metals. J. Phys. Chem. C. 124, 186–195 (2020).
Article CAS Google Scholar
Box, C. L., Zhang, Y., Yin, R., Jiang, B. & Maurer, R. J. Determining the effect of hot electron dissipation on molecular scattering experiments at metal surfaces. JACS Au 1, 164–173 (2021).
Article CAS PubMed Google Scholar
Sachs, M., Stark, W. G., Maurer, R. J. & Ortner, C. Equivariant representation of configuration-dependent friction tensors in Langevin heatbaths. Mach. Learn.: Sci. Technol. 6, 015016 (2025).
Google Scholar
Box, C. L., Hertl, N., Stark, W. G. & Maurer, R. J. Room temperature hydrogen atom scattering experiments are not a sufficient benchmark to validate electronic friction theory. J. Phys. Chem. Lett. 15, 12520–12525 (2024).
Article CAS PubMed PubMed Central Google Scholar
Dral, P. O., Barbatti, M. & Thiel, W. Nonadiabatic excited-state dynamics with machine learning. J. Phys. Chem. Lett. 9, 5660–5663 (2018).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Z., Zhang, Y., Wang, J., Xu, J. & Long, R. Doping-induced charge localization suppresses electron-hole recombination in copper zinc tin sulfide: Quantum dynamics combined with deep neural networks analysis. J. Phys. Chem. Lett. 12, 835–842 (2021).
Article CAS PubMed Google Scholar
Wang, B., Chu, W., Tkatchenko, A. & Prezhdo, O. V. Interpolating nonadiabatic molecular dynamics hamiltonian with artificial neural networks. J. Phys. Chem. Lett. 12, 6070–6077 (2021).
Article CAS PubMed Google Scholar
Prezhdo, O. V. Modeling non-adiabatic dynamics in nanoscale and condensed matter systems. Acc. Chem. Res. 54, 4239–4249 (2021).
Article CAS PubMed Google Scholar
Zhou, G., Lubbers, N., Barros, K., Tretiak, S. & Nebgen, B. Deep learning of dynamically responsive chemical hamiltonians with semiempirical quantum mechanics. Proc. Natl Acad. Sci. USA 119, e2120333119 (2022).
Article CAS PubMed PubMed Central Google Scholar
Akimov, A. V. Extending the time scales of nonadiabatic molecular dynamics via machine learning in the time domain. J. Phys. Chem. Lett. 12, 12119–12128 (2021).
Article CAS PubMed Google Scholar
Wang, Z., Dong, J., Qiu, J. & Wang, L. All-atom nonadiabatic dynamics simulation of hybrid graphene nanoribbons based on wannier analysis and machine learning. ACS Appl. Mater. Interfaces 14, 22929–22940 (2022).
Article CAS Google Scholar
Shakiba, M. & Akimov, A. V. Machine-learned Kohn-Sham Hamiltonian mapping for nonadiabatic molecular dynamics. J. Chem. Theory Comput. 20, 2992–3007 (2024).
Article CAS PubMed Google Scholar
Liu, D., Wang, B., Wu, Y., Vasenko, A. S. & Prezhdo, O. V. Breaking the size limitation of nonadiabatic molecular dynamics in condensed matter systems with local descriptor machine learning. Proc. Natl. Acad. Sci. USA 121, e2403497121 (2024).
Article CAS PubMed PubMed Central Google Scholar
Meng, G. et al. First-principles nonadiabatic dynamics of molecules at metal surfaces with vibrationally coupled electron transfer. Phys. Rev. Lett. 133, 036203 (2024).
Article CAS PubMed Google Scholar
Gardner, J., Habershon, S. & Maurer, R. J. Assessing mixed quantum-classical molecular dynamics methods for nonadiabatic dynamics of molecules on metal surfaces. J. Phys. Chem. C. 127, 15257–15270 (2023).
Article CAS Google Scholar
Kroes, G.-J. Computational approaches to dissociative chemisorption on metals: towards chemical accuracy. Phys. Chem. Chem. Phys. 23, 8962–9048 (2021).
Article CAS PubMed Google Scholar
Lejaeghere, K. et al. Reproducibility in density functional theory calculations of solids. Science 351, aad3000 (2016).
Article PubMed Google Scholar
Cui, M., Reuter, K. & Margraf, J. T. Multi-fidelity transfer learning for quantum chemical data using a robust density functional tight binding baseline. Mach. Learn.: Sci. Technol. 6, 015071 (2025).
Google Scholar
Elena, A. M. et al. Machine learned potential for high-throughput phonon calculations of metal-organic frameworks. npj Comput. Mater. 11, 125 (2025).
Article Google Scholar
Anker, A. S., Butler, K. T., Selvan, R. & Jensen, K. M. Machine learning for analysis of experimental scattering and spectroscopy data in materials chemistry. Chem. Sci. 14, 14003–14019 (2023).
Article CAS PubMed PubMed Central Google Scholar
Wilkinson, M. D. et al. The fair guiding principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).
Article PubMed PubMed Central Google Scholar
Gardner, J. et al. NQCDynamics.jl: a Julia package for nonadiabatic quantum classical molecular dynamics in the condensed phase. J. Chem. Phys. 156, 174801 (2022).
Article CAS PubMed Google Scholar
Stishenko, P. et al. Integrated workflows and interfaces for data-driven semi-empirical electronic structure calculations. J. Chem. Phys. 161, 012502 (2024).
Article CAS PubMed Google Scholar
Lussier, F., Thibault, V., Charron, B., Wallace, G. Q. & Masson, J.-F. Deep learning and artificial intelligence methods for Raman and surface-enhanced Raman scattering. Trends Anal. Chem. 124, 115796 (2020).
Article CAS Google Scholar
Yu, H.-Y., Muthiah, B., Li, S.-C., Yu, W.-Y. & Li, Y.-P. Surface characterization of cerium oxide catalysts using deep learning with infrared spectroscopy of CO. Mater. Today Sustain. 24, 100534 (2023).
Google Scholar
Tetef, S., Govind, N. & Seidler, G. T. Unsupervised machine learning for unbiased chemical classification in X-ray absorption spectroscopy and X-ray emission spectroscopy. Phys. Chem. Chem. Phys. 23, 23586–23601 (2021).
Article CAS PubMed Google Scholar
Srinivasan, S. et al. Machine learning the metastable phase diagram of covalently bonded carbon. Nat. Commun. 13, 3251 (2022).
Article CAS PubMed PubMed Central Google Scholar
Abed, J. et al. Open catalyst experiments 2024 (OCx24): Bridging experiments and computational models. arXiv:2411.11783 (2024).
Ramsauer, B. et al. Autonomous single-molecule manipulation based on reinforcement learning. J. Phys. Chem. A 127, 2041–2050 (2023).
Article CAS PubMed PubMed Central Google Scholar
Leinen, P. et al. Autonomous robotic nanofabrication with reinforcement learning. Sci. Adv. 6, eabb6987 (2020).
Article CAS PubMed PubMed Central Google Scholar
Lo, K. & Huang, D. On training derivative-constrained neural networks. arXiv:2310.01649 (2023).
Kim, H. et al. GeoTMI: predicting quantum chemical property with easy-to-obtain geometry via positional denoising. Adv. Neural. Inf. Process. Syst. 36, 46027–46040 (2024).
Google Scholar
Duval, A. A. et al. FAENet: Frame averaging equivariant GNN for materials modeling. Proc. Mach. Learn. Res. 202, 9013–9033 (2023).
Google Scholar
Ramlaoui, A. et al. Improving molecular modeling with geometric GNNs: An empirical study. arXiv:2407.08313 (2024).
Shoghi, N., Shoghi, P., Sriram, A. & Das, A. Distribution learning for molecular regression. arXiv:2407.20475 (2024).
Chang, J., Kuai, Y., Wei, X., Yu, H. & Lan, H. Molecular potential energy computation via graph edge aggregate attention neural network. Chin. J. Chem. Phys. 36, 691–699 (2023).
Article CAS Google Scholar
Shoghi, N. et al. From molecules to materials: Pre-training large generalizable models for atomic property prediction. arXiv:2310.16802 (2024).
Kovács, D. P. et al. Linear atomic cluster expansion force fields for organic molecules: Beyond RMSE. J. Chem. Theory Comput. 17, 7696–7711 (2021).
Article PubMed PubMed Central Google Scholar
Frank, T., Unke, O. & Müller, K.-R. So3krates: Equivariant attention for interactions on arbitrary length-scales in molecular systems. Adv. Neural. Inf. Process. Syst. 35, 29400–29413 (2022).
Google Scholar
Thölke, P. & Fabritiis, G. D. Torchmd-net: Equivariant transformers for neural network based molecular potentials. arXiv:2202.02541 (2022).
Liao, Y.-L. & Smidt, T. Equiformer: Equivariant graph attention transformer for 3d atomistic graphs. arXiv:2206.11990 (2022).
Shuaibi, M. et al. Rotation invariant graph neural networks using spin convolutions. arXiv:2106.09575 (2021).
Yu, H. et al. Spin-dependent graph neural network potential for magnetic materials. Phys. Rev. B 109, 144426 (2024).
Article CAS Google Scholar
Coors, B., Condurache, A. P. & Geiger, A. Spherenet: Learning spherical representations for detection and classification in omnidirectional images. Proc. Eur. Conf. Comput. Vis. 518–533 (2018).
Xie, T. & Grossman, J. C. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett. 120, 145301 (2018).
Article CAS PubMed Google Scholar
Zaverkin, V. & Kästner, J. Gaussian moments as physically inspired molecular descriptors for accurate and scalable machine learning potentials. J. Chem. Theory Comput. 16, 5410–5421 (2020).
Article CAS PubMed Google Scholar
Unke, O. T. & Meuwly, M. PhysNet: A neural network for predicting energies, forces, dipole moments, and partial charges. J. Chem. Theory Comput. 15, 3678–3693 (2019).
Article CAS PubMed Google Scholar
Faber, F. A., Christensen, A. S., Huang, B. & Von Lilienfeld, O. A. Alchemical and structural distribution based representation for universal quantum machine learning. J. Chem. Phys. 148, 241717 (2018).
Article PubMed Google Scholar
Haghighatlari, M. et al. NewtonNet: a Newtonian message passing network for deep learning of interatomic potentials and forces. Digit. Discov. 1, 333–343 (2022).
Article CAS PubMed PubMed Central Google Scholar
Chmiela, S., Sauceda, H. E., Müller, K.-R. & Tkatchenko, A. Towards exact molecular dynamics simulations with machine-learned force fields. Nat. Commun. 9, 1–10 (2018).
Article CAS Google Scholar
Musaelian, A. et al. Learning local equivariant representations for large-scale atomistic dynamics. Nat. Commun. 14, 579 (2023).
Article CAS PubMed PubMed Central Google Scholar
Anderson, B., Hy, T. S. & Kondor, R. Cormorant: Covariant molecular neural networks. Adv. Neural Inf. Process. Syst. 32, (2019).
Satorras, V. G., Hoogeboom, E. & Welling, M. E(n) equivariant graph neural networks. Proc. Mach. Learn. Res. 139, 9323–9332 (2021).
Google Scholar
Le, T., Noé, F. & Clevert, D.-A. Equivariant graph attention networks for molecular property prediction. arXiv:2202.09891 (2022).
Finzi, M., Stanton, S., Izmailov, P. & Wilson, A. G. Generalizing convolutional neural networks for equivariance to lie groups on arbitrary continuous data. Proc. Mach. Learn. Res. 119, 3165–3176 (2020).
Google Scholar
Batatia, I., Kovacs, D. P., Simm, G., Ortner, C. & Csanyi, G. Mace: Higher order equivariant message passing neural networks for fast and accurate force fields. Adv. Neural. Inf. Process. Syst. 35, 11423–11436 (2022).
Google Scholar
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. Neural message passing for quantum chemistry. Proc. Mach. Learn. Res. 70, 1263–1272 (2017).
Google Scholar
Curtarolo, S. et al. Aflow: An automatic framework for high-throughput materials discovery. Comput. Mater. Sci. 58, 218–226 (2012).
Article CAS Google Scholar
Pizzi, G., Cepellotti, A., Sabatini, R., Marzari, N. & Kozinsky, B. Aiida: automated interactive infrastructure and database for computational science. Comput. Mater. Sci. 111, 218–230 (2016).
Article Google Scholar
Gjerding, M. et al. Atomic simulation recipes: a Python framework and library for automated workflows. Comput. Mater. Sci. 199, 110731 (2021).
Article CAS Google Scholar
Winther, K. T. et al. Catalysis-Hub.org, an open electronic structure database for surface reactions. Sci. Data 6, 1–10 (2019).
Article Google Scholar
ChemCatBio data hub project. Catalyst property database. https://cpd.chemcatbio.org (2022).
Jain, A. et al. Commentary: The Materials Project: a materials genome approach to accelerating materials innovation. APL Mater. 1, 011002 (2013).
Article Google Scholar
The MaX project. MAX center. http://www.max-centre.eu (2018).
Draxl, C. & Scheffler, M. The nomad laboratory: from data sharing to artificial intelligence. J. Phys: Mater. 2, 036001 (2019).
CAS Google Scholar

Download references

Acknowledgements

The authors acknowledge financial support from the UKRI Future Leaders Fellowship program (MR/S016023/1, MR/X023109/1), a UKRI Horizon grant (ERC StG, EP/X014088/1), a Leverhulme Trust research project grant (RPG-2019-078), a UKRI Horizon grant (MSCA, EP/Y024923/1), and a UFO (Unkonventionelle Forschung) postdoctoral fellowship grant by the Austrian province of Styria.

Author information

Authors and Affiliations

Department of Chemistry, University of Warwick, Coventry, UK
Lukas Hörmann, Wojciech G. Stark & Reinhard J. Maurer
Department of Physics, University of Warwick, Coventry, UK
Lukas Hörmann & Reinhard J. Maurer

Authors

Lukas Hörmann
View author publications
Search author on:PubMed Google Scholar
Wojciech G. Stark
View author publications
Search author on:PubMed Google Scholar
Reinhard J. Maurer
View author publications
Search author on:PubMed Google Scholar

Contributions

L.H. and R.J.M. conceptualized the review. L.H. and W.G.S. prepared the figures and tables. L.H., W.G.S., and R.J.M. prepared the original draft and contributed to review and editing of the manuscript.

Corresponding authors

Correspondence to Lukas Hörmann or Reinhard J. Maurer.

Ethics declarations

Competing interests

Reinhard J. Maurer is an Associate Editor for npj Computational Materials and an Editorial Board Member for Communications Materials. All other authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Hörmann, L., Stark, W.G. & Maurer, R.J. Machine learning and data-driven methods in computational surface and interface science. npj Comput Mater 11, 196 (2025). https://doi.org/10.1038/s41524-025-01691-6

Download citation

Received: 11 December 2024
Accepted: 09 June 2025
Published: 01 July 2025
Version of record: 01 July 2025
DOI: https://doi.org/10.1038/s41524-025-01691-6

Subjects

Abstract

Similar content being viewed by others

Machine-learning-accelerated simulations to enable automatic surface reconstruction

Small data machine learning in materials science

Machine-learned potentials for next-generation matter simulations

Introduction

Motivation of the review

Structure prediction of surfaces and interfaces

Local structure optimization

Global structure optimization

Open challenges

Thermodynamic stability and compositional phase space under realistic conditions

Open challenges

Large-scale electronic property and spectroscopy prediction

Learning level alignment, interfacial orbital hybridization, band bending

Hamiltonian and band structure prediction

Prediction of electronic density of states, the electron density, and spectroscopic properties

Open challenges

Reaction barrier prediction

Transition state search

Direct activation energy prediction

Open challenges

Multiscale kinetic simulations

Mean-field microkinetic modeling

Kinetic Monte-Carlo simulations

Reaction networks

Open challenges

Accurate dynamics at large time- and length scales

Interatomic potentials

Equivariant interatomic potentials

Foundation models

Data generation

Inclusion of long-range effects

Benchmarking potential accuracy

Benchmarking prediction speed of potentials

Open challenges

Excited states and nonadiabatic dynamics

Open challenges

State of the field and future perspective

Benchmarking gap

Accuracy gap

Data gap

Open and FAIR data

Efficient and accurate inference

Machine learning and the future of surface characterization

Data availability

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links