Auto-resolving the atomic structure at van der Waals interfaces using a generative model

Huang, Wenqiang; Jin, Yucheng; Li, Zhemin; Yao, Lin; Chen, Yun; Luo, Zheng; Zhou, Shen; Lin, Jinguo; Liu, Feng; Gao, Zhifeng; Cheng, Jun; Zhang, Linfeng; Ouyang, Fangping; Zhang, Jin; Wang, Shanshan

doi:10.1038/s41467-025-58160-3

Download PDF

Article
Open access
Published: 25 March 2025

Auto-resolving the atomic structure at van der Waals interfaces using a generative model

Nature Communications volume 16, Article number: 2927 (2025) Cite this article

3316 Accesses
3 Altmetric
Metrics details

Subjects

Abstract

The high-resolution visualization of atomic structures is significant for understanding the relationship between the microscopic configurations and macroscopic properties of materials. However, a rapid, accurate, and robust approach to automatically resolve complex patterns in atomic-resolution microscopy remains difficult to implement. Here, we present a Trident strategy-enhanced disentangled representation learning method (a generative model), which utilizes a few unlabelled experimental images with abundant low-cost simulated images to generate a large corpus of annotated simulation data that closely resembles experimental results, producing a high-quality large-volume training dataset. A structural inference model is then trained via a residual neural network which can directly deduce the interlayer slip and rotation of diversified and complicated stacking patterns at van der Waals (vdW) interfaces with picometer-scale accuracy across various materials (e.g. MoS₂, WS₂, ReS₂, ReSe₂, and 1 T’-MoTe₂) with different layer numbers (bilayer and trilayers), demonstrating robustness to defects, imaging quality, and surface contaminations. The framework can also identify pattern transition interfaces, quantify subtle motif variations, and discriminate moiré patterns that are difficult to distinguish in frequency domains. Finally, the high-throughput processing ability of our method provides insights into a vdW epitaxy mode where various thermodynamically favorable slip stackings can coexist.

AtomAI framework for deep learning analysis of image and spectroscopy data in electron and scanning probe microscopy

Article 08 December 2022

Computational scanning tunneling microscope image database

Article Open access 11 February 2021

Deep Bayesian local crystallography

Article Open access 10 November 2021

Introduction

Two-dimensional (2D) van der Waals (vdW) materials have profoundly expanded the design space of artificial solids by manipulating the relative rotation and slip between adjacent atomic planes. The interlayer twist generates a moiré superlattice with long periodic potential that triggers exotic physical properties like superconductivity^1,2, ferroelectricity^3,4, and nontrivial topological states^5,6. The relative sliding between layers alternates atomic registries which enables to tailor material’s electrical^7,8,9,10, magnetic^11,12,13, and catalytic properties¹⁴. Scanning transmission electron microscopy (STEM) can provide sub-angstrom scale configuration information by raster scanning across the sample surface atom-by-atom and becomes a powerful tool to unclose the structure-property relation of various nanostructures including 2D vdW materials^15,16,17, thus laying the foundation for rational structural design and accurate performance prediction by theoretical calculations. However, the analysis of STEM images still predominantly relies on human experts, which suffers from long time consumption, inferior labeling accuracy, individual identification bias, poor tolerance for image imperfections (e.g. low image signal-to-noise ratio (SNR)¹⁸, surface contaminations¹⁹, etc.), limited analytical capability for complex patterns, and low-throughput hindered discovery of statistically grounded clues.

Machine learning (ML) algorithms bring opportunities for automatic classification, identification, and feature extraction of atomic-resolution images, which, in principle, can be divided into unsupervised and supervised strategies. Unsupervised learning is capable of uncovering inherent patterns and relationships of unlabeled data by techniques of clustering, association, and dimensionality reduction without upfront human intervention and has been employed to classify point defects^20,21,22 and extract structural motifs²³. However, the output of unsupervised learning often suffers from subjectivity and inferior interpretability due to a lack of predefined target variables, which restricts the tasks that the method can handle, makes objective evaluation of the model performance difficult, and commonly requires domain knowledge from human experts to achieve comprehensible and insightful discoveries. Supervised learning is another well-established approach that has been utilized to localize atomic columns^{21,24,25,26,27,28,29}, identify vacancies and dopants^24,30,31,32, segment polymorphs^30,33,34, categorize crystal structures^34,35,36,37, assign molecular chirality³⁸ and stacking orders³⁹, etc. This method has explicit performance evaluation (measured by errors between real and predicted values through the loss function), high prediction accuracy, generalizability to unseen data, and flexibility for a wide range of tasks.

However, supervised learning requires a large corpus of annotated data for model training, which comes from either manual labeling of experimental images^27,38,39 or simulated data generated by software with ground truth^{25,26,29,30,33,34,40}. The former has high training data quality but suffers from laborious labeling work, poor annotation accuracy, and scarcity of experimental images. The latter can satisfy data sufficiency at a low cost. However, the quality of the simulated data is severely inferior to experimental images due to prominent discrepancies in the visual style, which is determined by various factors including detector noise, scanning distortion, lens aberrations, surface contamination, etc. The difficulty in simultaneously achieving high quality and large volumes of the training dataset makes the current application of supervised learning in atomic-resolution image analysis restricted to simple questions, such as classifying limited types of microstructures with obvious pattern disparity, with inference accuracy highly sensitive to the image presentation state (e.g. SNR¹⁸, defects^17,41,42 and contamination¹⁹, image drift⁴³ and astigmatism⁴⁴, etc.). Can we extend the ability of supervised learning algorithms from “identifying” discrete and finite microstructures, which is a simple classification question (e.g. differentiating atomic defects from pristine lattices that show obvious and fixed structural disparity) to “solving” complex patterns with almost continuous variations and subtle disparities, which is a regression question in a higher difficulty level (e.g. outputting accurate interlayer slip vectors and twist angles corresponding to different stacking patterns whose atomic position differences are only on the picometer scale)? Clark et al. pioneered the introduction of a cycle generative adversarial network (CycleGAN) to augment simulated STEM images with realistic spatial frequency information, which has demonstrated feasibility in identifying atomic defects³². However, the imperfection of CycleGAN in strictly maintaining image contents after style transfer limits the scope of the scientific problems that can be handled by this algorithm (Supplementary Fig. 1). Therefore, it is highly desirable to develop a fast, accurate, and robust supervised learning framework that can automatically accomplish complex structural analysis tasks.

In this paper, we develop a Trident strategy-enhanced disentangled representation (DR) learning approach⁴⁵, which utilizes a small set of unlabeled experimental STEM images with abundant low-cost simulated images to generate a large annotated training dataset that closely resembles experimental image styles with the simulation image contents strictly maintained after style conversion, thus showing a superior balance between the quality and quantity of training data. A structural inference model is then trained by these high-quality simulated images using a residual neural network, which enables direct output of the interlayer slip and rotation of diversified and complicated stacking patterns in an end-to-end manner with an accuracy of picometer level. Our framework can also identify stacking pattern transition interfaces, quantify subtle motif variations with a high spatial resolution, and discriminate moiré patterns that are difficult to distinguish in the frequency domains. Our model demonstrates robustness to defects, imaging quality, and surface contaminations and can be generalized to various vdW materials (e.g. MoS₂, WS₂, ReS₂, ReSe₂, and 1T’-MoTe₂) and different layer numbers (e.g. bilayer and trilayers).

Results

Overview of the ML framework

The first step of the framework is to train a generative model via a Disentangled Representation for Image-to-Image Translation (DRIT) algorithm⁴⁵ that can produce high-quality STEM simulated images (Fig. 1a). It is realized by combining the structural information (e.g. position, brightness, and size of atoms) from the software-generated low-quality, noise-free simulated images with the visual style from the experimental images. The second step is to define structural descriptors for slip and twisted stackings which enable to represent all potential stacking configurations, followed by the generation of realistic STEM simulated images via the well-trained DRIT model in the first step (Fig. 1b). A large training dataset having precise labels and high stylistic similarity with experimental images is thus achieved, alleviating the problem of data scarcity caused by the high cost of the STEM experiments and the inefficiency of atom-by-atom manual labeling of experimental images. The descriptor of the slip stacking is a slip coordinate (D_a, D_b) acquired by the decomposition of the slip vector D along two in-plane base vector directions of the monolayer unit cell, while for the twisted stacking, an interlayer rotation angle θ is applied (Supplementary Fig. 2). The third step is to train an end-to-end stacking structure identification model using a ResNet-50 architecture as the backbone of the regression network. The relations between the stacking structural labels, i.e. (D_a, D_b) and θ, and the realistic STEM simulated images are learned by the two ResNet models, respectively (Fig. 1c), thus enabling the straightforward, accurate, and efficient auto-resolving of the interlayer sliding and twist at the vdW interface from the experimental images.

**Fig. 1: Machine learning (ML) workflow.**

The key to the overall workflow is the DRIT model training, which determines whether the abundant, low-cost but also low-quality STEM simulated images can be successfully transformed into the high-quality counterparts with structural information strictly unchanged and visual style greatly resembling the experimental images so that large training data can be obtained for the following supervised learning. Two points require in-depth comprehension. One is the reason for selecting the DRIT algorithm for the style transfer. The other is the modifications required to the basic DRIT model for better task accomplishment.

For the first point, it is essential to recognize two primary challenges of our visual effect transformation task. First, the training data is severely unbalanced, where the simple, noise-free STEM simulated images can be easily generated in batches by the Computem software but the experimental images from which the realistic visual style can be extracted are scarce. Second, the stacking structure is much more complex than the previously discussed point defects (e.g. vacancies and dopants), displaying obvious configuration diversity and subtle structural discrepancy (at sub-angstrom scale) between different stackings. Fortunately, the DR learning algorithm represented by the DRIT model is proficient in solving these difficulties. It decouples an image into two distinct spaces: a domain-invariant content space capturing shared structural information across the noise-free simulated images and the experimental images, and a domain-specific attribute space independently extracting the visual styles from the noise-free simulated images and the experimental images. Such algorithm architecture together with three strategies, i.e., weight sharing, a content discriminator, and a cross-cycle consistency loss, ensures effective visual style conversion with the image contents strictly unchanged for unbalanced and unpaired training data involving complex structural information (Supplementary Figs. 3–5). Moreover, DR learning aligns with the human cognitive pattern, exhibiting explicit translation stage by stage, thus improving the interpretability and comprehensibility of the algorithm.

For the second point, despite the superiority of the basic DRIT model in style transformation to some other algorithms such as CycleGAN⁴⁶, we developed a “Trident strategy” to further strengthen the reliability and robustness of the model. Three operations were conducted: (i) Data selection. We applied an unsupervised learning method called the t-distributed stochastic neighboring embedding (t-SNE) algorithm to topologically cluster the large numbers of programmatically generated noise-free simulated images and select those that adopt relatively high structural similarity with the experimental images for the subsequent DRIT model training (Fig. 1a(I)) (see Methods). This step is especially significant when the number of the experimental images employed for the model training is minimal (e.g. only three experimental STEM images were used for the bilayer ReS₂ slip stacking training task) since it helps the model avoid misinterpreting the image features belonging to the domain-invariant content space (structure information) to the domain-variant attribute space (visual style). (ii) Data augmentation. Affine transformations (e.g. cropping, scaling, rotation, and flipping) which preserve collinearity and distance ratios of points but alter and diversify their absolute coordinates were then applied on both experimental and selected simulated images, thus enhancing the ability of the DRIT algorithm in maintaining the structure information during the image style conversion (Fig. 1a(II)). (iii) Content consistency loss. The experimental and simulated STEM images after data selection and augmentation were finally introduced to the DRIT model for training (Fig. 1a(III)), whose architecture involves the content encoders (${E}_{\exp .}^{c}$, ${E}_{{{\rm{sim}}}.}^{c}$) enciphering the atomic structures from both the experimental and simulated images into a mutual content space, the attribute encoders (${E}_{\exp .}^{a}$, ${E}_{{{\rm{sim}}}.}^{a}$) enciphering the visual style from the experimental and the simulated images independently into their respective attribute spaces, and the generators (${G}_{\exp .}$, ${G}_{{{\rm{sim}}}.}$) receiving the specified spaces’ contents and attributes to reconstruct images. If the structure and the style are correctly decoupled for both experimental and simulated images, the generated simulation image should adopt the same atomic configuration as the original experimental image, exhibiting only a discrepancy in the visual style. Similar scenarios should apply to the original simulation image and the generated experimental image. However, due to the lack of explicit constraints between the simulation and the experimental domains, the above scenarios cannot always be guaranteed for the basic DRIT model. Therefore, a content-consistent loss function ${L}_{{{\rm{consistent}}}}^{{{\rm{content}}}}$ was designed as follows to ensure structure invariability before and after the image transformation:

$${L}_{{{\rm{consistent}}}}^{{{\rm{content}}}}\left(x,y,u,v\right)={{\mathbb{E}}}_{x,y}[{{\rm{||}}}u-y{{{\rm{||}}}}_{1}+{{\rm{||}}}v-x{{{\rm{||}}}}_{1}]$$

(1)

where x and y represent the original simulation and experimental images, while u and v represent the generated simulation and experimental images, respectively. Note that only when Trident strategies are implemented together on the DRIT model can the noise-free simulated images be transformed into high-quality ones with a visual style highly resembling the experimental images and atomic structure unchanged (Supplementary Figs. 6–9).

Performance of the Trident strategy-enhanced DRIT model

We selected an annular dark field (ADF) STEM image of slip-stacked bilayer ReS₂ as a test case (Fig. 2a) and compared the image generation quality from two perspectives: structural consistency and stylistic similarity. Simulation images generated by three different approaches were compared, which are the noise-free simulated image generated by the Computem⁴⁷ software based on the manually resolved atomic model corresponding to panel (a) (Fig. 2b), the simulated image constructed by adding Gaussian noise on panel (b) (Fig. 2c), and the Trident strategy-enhanced DRIT model-generated image combining the structure information in panel (b) with the visual style in panel (a) (Fig. 2d). Atom-by-atom comparison between panels (b) and (d) demonstrates good structural retention of the DRIT algorithm without atomic misalignment when conducting style transformation (Fig. 2e). Figure 2f displays the grayscale distributions of panels (a) to (d), where the DRIT-generated image matches the best with the experimental image regarding the peak location and the full width at half maximum. We introduced Kullback-Leibler divergence (D_KL) to measure the grayscale distribution difference between the three types of simulated images and the experimental one, showing 0.5 for panel (b), 0.12 for panel (c), and 0.01 for panel (d) (the smaller, the more similar), quantitatively verifying the highest overall stylistic similarity between the DRIT-generated image and the experimental one. The intensity line profiles taken along the same locations in panels (a) to (d) along the dashed lines display local visual style resemblance between the simulated and experimental images (Fig. 2g), where the arithmetic mean deviation (R_a) evaluating the curve average fluctuations shows 0.18 for panel (a), 0.25 for panel (b), 0.22 for panel (c), and 0.20 for panel (d) (the closer, the more similar), further supporting the superiority of the DRIT algorithm in style learning. We also investigated the D_KL and R_a of simulation images using the JEMs⁴⁸ software, yielding 0.36 and 0.25, which is also inferior to the DRIT-generated data (Supplementary Fig. 10). Time complexity analysis shows ～1 s, ～2 s, and ～160 s cost for generating a 1024 × 1024 simulation image using the DRIT algorithm, Computem, and JEMs software, respectively, representing an advantage in generation efficiency of our method.

**Fig. 2: Performance of the Trident strategy-enhanced generative model.**

Structural analysis of the slip-stacked interfaces

The framework was first applied to resolve the atomic registries of slip-stacked vdW bilayers, which have rotationally aligned top and bottom layers (no interlayer twist) but exhibit sub-angstrom-scale discrepancies in the interlayer sliding, thus showing various physical properties. Although the structural information of different slip stackings is encoded in their complex-valued 2D fast Fourier transform (FFT) (Supplementary Fig. 11) and can be resolved by advanced diffraction techniques like the four-dimensional STEM Bragg interferometry methodology^49,50, the atom-by-atom analysis of the real-space, high-resolution ADF-STEM images is still the most simple and swift means to identify them without high requirements on the equipment. Bilayer ReS₂, which was experimentally observed to display diversified slip stacking patterns^15,51, was selected as a test case to evaluate four abilities of this framework: (i) resolving the slip stacking configuration from a raw ADF-STEM image, (ii) quantitatively perceiving subtle structural evolution of the pattern, (iii) accurately localizing the pattern transition interface, and (iv) efficiently performing statistical analysis on large data volumes and contribute to innovative discoveries.

The top panels of Fig. 3a show 6 representative slip stacking patterns involving structures with both overlapping atoms (inconsistent brightness between different atomic columns) and staggered counterparts (uniform brightness of each atomic column). These images were captured from three different shoots having inequable instrument states and aberration parameters. They were input into the structure inference model with neither noise smoothing nor brightness and contrast adjustment (Supplementary Fig. 12a). Our inference model can swiftly figure out the slip vector coordinates, which are subsequently transformed into atomic models automatically (bottom panels) and verified to be correct based on both expert knowledge and image simulation (Supplementary Fig. 12b, c). Note that the accuracy of the model’s resolved coordinates depends upon the step size utilized to generate the realistic ADF-STEM simulated image dataset. We used the DRIT-generated bilayer ReS₂ images with a step size of 0.05 Å as a test dataset and employed the Euclidean distance ∆D to evaluate the accuracy of the inferred slip coordinates from different inference models trained by the DRIT-generated images with step sizes ranging from 0.1 to 0.4 Å (in increments of 0.1). The Euclidean distance ∆D is represented as follows:

$$\Delta D=\left|{{{{\bf{D}}}}}_{\inf }-{{{{\bf{D}}}}}_{{{{\rm{gt}}}}}\right|$$

(2)

where D_inf is the inferred slip vector from the inference models trained by a generated dataset with a certain step size, and D_gt is the ground truth slip vector from the test dataset with precise labels (inset of Fig. 3b). The boxplot in Fig. 3b exhibits increase of both the average ∆D and ∆D corresponding to the middle 95% of the data (box upper limit) as the step size rises, implying degradation of the model accuracy. Considering the balance between the inference model accuracy and the training cost, we chose a step size of 0.1 Å to construct the simulated image dataset, yielding a mean ∆D of 0.03 Å with 95% of the inferred results deviating less than 0.05 Å from the ground truth, which is suffice for solving our experimental images whose spatial resolution is ～0.7 Å (Supplementary Figs. 13 and 14).

**Fig. 3: Automated structural analysis of slip-stacked van der Waals (vdW) bilayers.**

Our picometer-level accurate framework can be readily generalized to measure faint slip stacking shifts in a large-area STEM image (Fig. 3c). Limited by inferior recognition accuracy and processing throughput of human experts, this image containing more than 3000 atoms was commonly assigned to one slip stacking structure. However, our algorithm took the lower right corner as the reference region and resolved the slip vector nanometer-by-nanometer, revealing ～0.05 Å/nm slip evolution with subtle interlayer sliding direction variations, as highlighted by the colorful arrows (Supplementary Fig. 15). Such a phenomenon implies that the top and bottom layers are not rigidly and flatly stacked, potentially involving ripples that induce interlayer spacing fluctuations.

The grain boundary often triggers property mutation and is pivotal for microstructure analysis. In slip-stacked bilayer ReS₂, an instant stacking pattern transition was observed due to grain boundaries in the bottom layer, which altered the growth direction in the top lattice via vdW epitaxy (Fig. 3d, Supplementary Fig. 16)⁵². We applied a strided pattern matching technique (SPMT) with variable step sizes to balance the global search efficiency and the local analysis resolution at the transition interface (see Methods). A sliding window coarsely scanned across bunches of STEM images with a large stride in the first round. Then, the pattern transition interface was extracted automatically, where a small stride comparable with the length of a chemical bond (step size: 1 Å) was employed to locate the pattern transition interface precisely with the boundary effect mitigated and intricate structural features unveiled (Supplementary Fig. 17a–e). The D_a and D_b mappings in Fig. 3e and Supplementary Fig. 17f display a sharp pattern-switching interface, which agrees well with the human expert knowledge (yellow dashed line) (Supplementary Fig. 18).

We fed 150 pieces of large-scale images involving ～5 × 10⁵ atoms to the inference model, which were programmatically divided into small patches with a side length of 2 nm for high spatial resolution analysis. (D_a, D_b) of 3750 patches were resolved within 4 min, showing almost two magnitudes of time cost improvement over humans. The slip coordinates are projected onto a density functional theory (DFT) calculated potential energy landscape (PEL) of bilayer ReS_2, representing the interlayer energy undulations when one layer slips over another. Two innovative phenomena were unclosed (Fig. 3f): (i) Bilayer ReS₂ exhibits diversified slip stacking structures, whose coordinates almost continuously distribute on the PEL diagram, distinctive from most 2D vdW materials (e.g. graphene and MoS₂) adopting very limited energetically favorable slip stacking configurations (commonly less than 3 types)^53,54,55,56. (ii) The slip coordinates primarily aggregate in the low energy regions on the PEL, as marked by the black dashed lines, suggesting a thermodynamically driven formation mechanism. The discovery of such a superlubricity-like stacking behavior deepens our comprehension of the vdW epitaxy. It implies a potential approach to constructing diversified slip stackings by direct synthesis, whose key point may lie in selecting low-symmetry vdW materials with weak interlayer coupling, like triclinic ReS₂, so that their PELs adopt abundant energy minima with gentle energy undulation (Supplementary Fig. 19), making various slip stacking configurations accessible under mild environment thermal disturbance.

Robustness and generalizability of the inference model

The inference accuracy of our model remains robust when the experimental images suffer from a certain concentration of defects or a low SNR. This is particularly significant for STEM analysis of 2D materials with atomic thinness since the electron beam (EB) often induces radiation damage^57,58 like vacancies and holes to these fragile membranes while reducing the EB dosage to suppress the defect generation will inevitably lead to deterioration of the image quality (Fig. 4a). Data augmentations were employed during the inference model training such as adding random proportions of defects, masks, contaminations, and Gaussian noise to the training set (Supplementary Fig. 20). We then used the DRIT-generated realistic STEM simulated images with known structural labels and randomly embedded vacancies to test the inference model’s sensitivity to defects and image SNR (Supplementary Figs. 21 and 22, Supplementary Tables 1 and 2). Figure 4b shows that even if the defect concentration rises to 10% (insets of Fig. 4b), the average ∆D is still constrained at 0.32 Å with 75% of the total test data yielding ∆D of less than 0.2 Å (Supplementary Fig. 23a). The model also displays insensitivity to the SNR degradation, where the average ∆D is stable at ～0.04 Å for the image peak signal-to-noise ratios (PSNR) ranging from 30 to 7 (Fig. 4c). Even when the PSNR decreases to 7 (right inset of Fig. 4c), there still exists more than 90% of the test data having ∆D of less than 0.2 Å (Supplementary Fig. 23b). Our model also remains robust when the specimens suffer from surface contamination (Supplementary Fig. 23c, d).

The framework was subsequently utilized to resolve experimental images of defective slip-stacked bilayer ReS₂ (having holes in one layer, as marked by the white box in Fig. 4d). The D_a and D_b mappings display similar slip coordinates around the hole with those in the defect-free areas (Fig. 4e and Supplementary Fig. 24a, b), while slight discrepancy is ascribed to the hole-induced lattice deformation. Further investigation manifests the model’s parsing capability intact, even if the hole accounts for 25% of a patch area (Supplementary Fig. 24c, d). The model can also rationally provide structural outputs of the experimental image of slip-stacked bilayer ReS₂ with grain boundaries (Supplementary Fig. 25). Then, we selected 10 sets of paired high and low SNR STEM images. Each image pair was taken over the same region of the 2D material but with a 4-fold difference in EB irradiation dose. We applied the model to analyze the high and low SNR images and compared the inferred slip vectors between them, finding an average ∆D difference of only 0.15 ± 0.08 Å (Fig. 4f). A typical pair of high and low SNR experimental images are shown in Fig. 4g (i and ii), where the atomic model resolved from the low SNR image (iii) by our algorithm is displayed in the top right corner. The noise-free simulated ADF-STEM image (iv) based on the atomic model agrees well with the high SNR experimental image (i), indicating the inference validity of the low SNR counterpart. The intensity line profiles taken along the same regions in panels (i), (ii), and (iv) exhibit similar line shapes, further demonstrating the strong resolving ability of our framework on experimental images of inferior quality.

The model can be handily generalized to the structural analysis of slip-stacked trilayers, in which case human experts can only employ trial and error, like playing jigsaw puzzles, to infer potential answers due to structural complexity explosion. The top right panels in Fig. 4h illustrate the reasoning process of humans in the face of a trilayer ReS₂ experimental STEM image, where different potential atomic registries between the three layers are sought sequentially with the image simulation conducted one by one until the simulated image matches with the experimental observation. However, our ML approach only requires an extension of the slip stacking structural descriptor from one set of slip coordinates (D_a, D_b) for the bilayer scenario to two sets of slip coordinates, (D_a, D_b) and (D_c, D_d), to represent the interlayer displacements at the 1^st−2^nd interface and the 2^nd− 3^rd interface, respectively. Then a structural inference model for trilayer stacks was trained utilizing the DRIT-generated high-quality simulated images with an increased dataset volume, which can readily export atomic registries of trilayer slip stackings in an end-to-end manner (bottom panel in Fig. 4h, Supplementary Fig. 26). In addition, our workflow can also be extended to unclose slip stacking structures of other vdW materials with different crystal structures and elements (Supplementary Figs. 27–30).

Structural analysis of the twist-stacked interfaces

The ML framework can also directly resolve the twist angle of vdW materials based on the moiré pattern captured by the STEM image, which is crucial for comprehending the structure-property relationship of such superlattices. One may argue that the interlayer twist can be swiftly figured out from the frequency space by conducting the FFT of the STEM image. However, in some scenarios, such a strategy may have limitations in feasibility and accuracy. First, some moiré patterns corresponding to different twist angles display similar FFT diagrams with indistinguishable disparity. Second, the FFT diagrams of small-area moiré patterns exhibit poor resolution, leading to remarkable errors in angle measurement. All the above circumstances require a method that can directly, accurately, and uniquely assign the twist angle based on the stacking pattern in the real space.

To realize this goal, we conducted modifications to the framework. Distinct from the slip stackings where one set of slip coordinates corresponds to one stacking pattern with a short periodicity length at the sub-nanometer scale, the twist-stacked vdW materials commonly generate superlattices with a much longer periodicity length, which is determined by the twist angle, crystal symmetry, and the detailed lattice parameters. Taking bilayer MoS₂ as an example, its periodicity length (L)⁵⁹ can be expressed as follows:

$$L=\frac{a}{\sqrt{2(1-\cos {{{\rm{\theta }}}})}}$$

(3)

where θ represents the twist angle and a is the lattice constant of monolayer MoS₂ (0.3165 nm). If the twist angle is 1.5°, the periodicity length reaches 12.08 nm. It is known that the structure inference model takes DRIT-generated simulation image patches with the size of ～2 nm as the input and the interlayer structural descriptor as the output for training. However, the periodicity length of the twist-stacked bilayers often surpasses the patch size. In this case, we use a sliding window with a side length of ～2 nm to scan an area with a side length of not less than L on a DRIT-generated STEM simulation image at a step size (∆L) of 0.2 Å. A set of images acquired at different locations of one moiré pattern is thus collected, which can fully describe all the possible patterns corresponding to one twist angle and are used as the training data for one twist angle (Fig. 5a). Multiple sets of DRIT-generated images and their corresponding twist angles can be achieved in this way and are then utilized for inference model training, yielding an average error as small as 0.29° with 97.5% of the total data having an error of less than 1° (Supplementary Fig. 31).

**Fig. 5: Automated structural analysis of twist-stacked vdW bilayers.**

The high-quality simulated data empowers the inference model with the ability to accurately identify the twist angles of highly resembled moiré patterns. Figure 5b is an experimental ADF-STEM image of twist-stacked bilayer MoS₂, whose 12 reflection spots in its FFT pattern corresponding to {100} crystal plane family show similar intensities (top panel in Fig. 5c), leading to two potential twist angle assignments, 20° and 40° (bottom panels in Fig. 5c), if using the FFT pattern to infer. Similar cases will appear in moiré patterns of bilayer MoS₂ with twist angles of θ and 60-θ (θ∈(0°, 60°)) (Supplementary Fig. 32). Interestingly, our ML algorithm uniquely refers to the twist angle of Fig. 5b as 20°. To judge whether the model inference is correct, we have a closer look at the atomic models of bilayer MoS₂ with the twist angles of 20° and 40°, where the separation distances between the adjacent S atomic columns reveal subtle disparity (magenta and blue ovals in the middle panel of Fig. 5d). High-quality simulated images for these two twist angles are then generated via DRIT algorithm (bottom panel in Fig. 5d), in which the intensity profiles taken along the magenta and blue ovals unclose more detailed structural difference in the sulfur peak splitting for the two moiré patterns (Fig. 5e). The intensity feature of the experimental image taken at the same location (red oval in the top panel of Fig. 5d) matches well with that of the 20° twisted simulated image, verifying the credibility of our model’s outcome.

Due to spontaneous reconstruction and imperfection in the mechanical transfer, the twist angle, as well as the moiré pattern, may exhibit spatial inhomogeneity^60,61. To test whether our model can directly resolve the spatial alteration of twist angles in real space, we artificially constructed a large-area atomic model of bilayer MoS₂ (Fig. 5f), where the twist angles at different small triangular regions switch between 9.8° (blue triangle) and 11.8° (yellow triangle). Figure 5g is the DRIT-generated simulation image corresponding to the atomic model in the black-boxed region. If performing FFT to the white-boxed region in which the moiré pattern is homogeneous, the reflection spots are fuzzy due to the small pattern area (Fig. 5h), making accurate measurement of the twist angle infeasible. If FFT is applied to the whole image, the reflection spots marked by the white circles split into two closely located subspots corresponding to 9.8° and 11.8° twist angles without inclusion of any real-space location information (Fig. 5i). Figure 5j is the twist angle mapping of Fig. 5g via patch-by-patch inference using our model, which agrees with the ground truth well with only imperfections at the domain boundaries due to structural deviation from the intrinsic lattice at these regions. The model’s resolving ability to structures involving random spatial variation of the twist angle has also been evaluated, yielding an average inference error within 1° (Supplementary Fig. 33).

Discussion

To sum up, we describe a Trident strategy-enhanced DR learning algorithm that solves a key problem in supervised learning, i.e. how to easily achieve training data with high quality and large quantity, which is vital for the application of supervised learning in scientific fields that suffer from scarcity of experimental data, heavy time and labor cost of labeling, and high complexity of the problems. The structural inference model trained by the DRIT-generated high-quality simulated images can directly, rapidly, and accurately figure out atomic-scale configurations at vdW interfaces based on stacking patterns in STEM images across various materials with different stacking modes (slip and twist), layer numbers (bilayers and trilayers), and imaging states (defect ratios, SNR, contaminations) and has the potential to extend to other complicated microstructure analysis. The automated and high-throughput processing ability of the ML method leads to the discovery of a vdW epitaxy mode where diversified thermodynamically favorable slip stackings with almost continuous variations coexist, exhibiting the ML contribution to the knowledge emergence. This work expands the ability of supervised learning from identifying discrete and simple microstructures to analyzing complex and continuously changing motifs. The ML approach demonstrates superiority in efficiency, accuracy, and complexity of problem-solving over human experts, which may revolutionize the characterization and interpretation modes of atomic configurations in microscopy images, paving the way to fast, accurate, automatic, and statistically grounded information extraction of nanomaterials.

Methods

Growth and transfer of MX2(M = Re and Mo, X = S) thin films

ReS₂ and MoS₂ atomically thin layers were grown using hydrogen-free atmospheric pressure chemical vapor deposition (CVD) methods^51,62. For the growth of ReS₂, a space-confined CVD setup was constructed with sodium perrhenate (NaReO₄, 99.95%, RHAWN) and sulfur (S, 99.5%, Sigma-Aldrich, 200 mg) used as precursors. 0.01 mol/L NaReO₄ aqueous solution was spin-coated onto the c-sapphire substrate surface. Two substrates were placed face-to-face, generating a space-confined reaction microcavity downstream of the furnace. S powder was placed upstream. The rhenium and sulfur sources were maintained at 820 °C and 150 °C, respectively, for 15 min to produce ReS₂ with Ar used as the carrier gas. For the growth of MoS₂, molybdenum trioxide (MoO3, 99.5%, Sigma-Aldrich) and sulfur (S, 99.5%, Sigma-Aldrich) powder were used as precursors. To avoid the quench of MoO₃ powder by S vapor during the reaction, an inner tube with MoO₃ powder placed inside was inserted into the outer 1 in. quartz tube, where S powder was positioned upstream. Two furnaces were applied to give independent temperature control on both two precursors and the substrate. The typical heating temperatures for S, MoO₃ and SiO₂/Si substrate were ～180, ～300, and ～800 °C, respectively, with a growth time of ～20 min.

For the transfer of MX₂ thin layers, a thin film of poly (methyl methacrylate) (PMMA) was initially spin-coated on the MX₂/substrate surface. The specimen was then gently floated on a 2 mol/L potassium hydroxide (KOH) solution. When the PMMA/MX₂ film was detached from the substrate, the film was transferred to the deionized water three times to thoroughly remove residuals left by the etchant. Next, the film was transferred to a TEM grid, dried naturally in the air, and baked on the hotplate at 180 °C for 15 min. The PMMA scaffold was finally removed by submerging the TEM grid in acetone for 8 h.

STEM characterization

For the slip stacking of bilayer ReS₂, ADF-STEM imaging was conducted on an aberration-corrected Titan Cubed Themis G2 300 under an accelerating voltage of 300 kV. Conditions were a condenser lens aperture of 50 mm, convergence semi-angle of 21.3 mrad, and collection angle of 39–200 mrad. The dwell time of a single frame was 2 μs per pixel. A pixel size of 0.012 nm px⁻¹ as well as a beam current of 30 pA were used for imaging. For the twisted-stacking of bilayer MoS₂, ADF-STEM imaging was conducted using an aberration-corrected JEOL ARM300CF STEM equipped with a JEOL ETA corrector operated at an accelerating voltage of 60 kV located in the Electron Physical Sciences Imaging Centre (ePSIC) at Diamond Light Source. Dwell times of 5–20 μs and a pixel size of 0.006 nm px⁻¹ were used for imaging. Optical conditions used a CL aperture of 30 μm, a convergence semiangle of 31.5 mrad, a beam current of 44 pA, and inner−outer acquisition angles of 49.5–198 mrad. Our method demonstrates that ADF-STEM images captured by different types of STEM apparatus with various conditions can be well simulated using a Trident strategy-enhanced DRIT algorithm.

Noise-free STEM image simulation

Simulated STEM images were generated by the open-source ‘incoSTEM’ package in Computem. These images can be generated for any material and stacking pattern based on predefined structure files. The structure files were generated in the following four steps using the ‘ASE’ package: (1) Build a model of the first layer of atoms, consisting of 20 × 20 single cells. (2) Replicate the first layer and adjust the atomic coordinates in the replicated layer by adding displacement in the z-direction, thus achieving the second layer of atoms. (3) Introduce displacement to the replicated second layer along the two in-plane base vector directions, or rotate the replicated layer as a whole around the z-direction in order to generate slip- or twist-stacked bilayer models. (4) Record the coordinates of artificially constructed bilayer models as the structure files, and the slip coordinates (D_a, D_b) and the twist angles θ as the labels. These steps were integrated into an automated workflow to efficiently generate simulated images with an average of 1800 images per hour.

Data selection

The high-dimensional STEM image features were first extracted using a pre-trained VGG16 model. Then t-SNE mapped them onto a 2D space by optimizing the position of data points to minimize D_KL between high- and low-dimensional distributions. The key t-SNE hyperparameter settings included n-components at 2 and perplexity at 12. K-means clustering was then used to divide the data into K = 2 non-overlapping clusters using Euclidean distance in the 2D space of t-SNE reduction. According to the clustering results, the simulation images that are in the same cluster as the experimental images and close to the experimental images were selected as the training data for the DRIT algorithm.

Trident strategy-enhanced DRIT model training

The training data of the Trident strategy-enhanced DRIT model consists of unpaired experimental STEM images and noise-free simulated images. The unlabeled raw experimental images utilized for the model training of slip-stacked bilayer ReS₂, ReSe₂ and twist-stacked MoS₂ are all three small photos involving ～900 atoms in total, while the noise-free simulated STEM images applied are 41 photos for slip-stacked ReS₂, 32 photos for slip-stacked ReSe₂ and 119 photos for twist-stacked MoS₂. For each dataset, we normalized all the image intensities to [0,1] by their minimum and maximum values and resized them to 1024 × 1024 pixels. Before inputting each patch into training, they were first rotated in the range of −45° to 45°, then randomly cropped to a size from 462 × 462 to 612 × 612 pixels and resized to 1024 × 1024 pixels, and finally flipped (vertically or horizontally) with a probability of 50% for data augmentation. The training procedures used the Adam optimizer with a batch size of 2. All networks of the DRIT model were trained for 3000 epochs, where the learning rate was 1.0 in the first 600 epochs, and linearly decreased to 0 from the 601-3000 epochs. Supplementary Figs. 3 and 8a show detailed model architecture and loss functions.

Evaluation metrics for simulated image quality

D_KL is a statistical distance of the difference between two probability distributions (P and Q). The more similar the two probability distributions are, the closer their D_KL is to 0. It is formulated as follows:

$${D}_{{KL}}(\left.P\right|\left|Q\right.)=\sum P\log \left(\frac{P}{Q}\right)$$

(4)

R_a represents the arithmetic mean of the absolute values of the ordinate Z(x) within the sampling length and reflects the curve average fluctuations. The more similar the two curves are, the closer their R_a values are. It is formulated as follows:

$${R}_{{{{\rm{a}}}}}=1/l{\int }_{0}^{l}\left|Z(x)\right|{dx}$$

(5)

PSNR is a measure of the quality of the noise-adding image in Fig. 4c compared to the original DRIT-generated images, expressed in decibels (dB). Higher PSNR values typically indicate better image quality. It is calculated as follows:

$${{{\rm{PSNR}}}}=10\times {\log }_{10}\left(\frac{{{{{\rm{MAX}}}}}^{2}}{{{{\rm{MSE}}}}}\right)$$

(6)

where MAX is the maximum possible pixel value and MSE is the mean squared error between the original and the noised image.

Stacking structure inference model training

The improved ResNet-50 architecture was used as the backbone of the inference network to achieve the end-to-end quantitative analysis of stacking structures. The full connectivity layer of [1000, 512, n] was used as the output header of the model. The size of n was set to 1 for the twisted stacking task, 2 for the slip stacking bilayer task and 4 for the trilayer slip stacking task. Common data augmentation techniques were applied, including random cropping, scaling, flipping, and rotating. To further improve the inference model’s robustness to defects, surface contamination, and varying imaging conditions (different doses), other data augmentations were also employed during the inference model training, such as adding random proportions of defects, masks, contaminations, and Gaussian noise to the training set. The training data used here are DRIT-generated simulation images of the pristine lattice with a range of atoms masked and different levels of noise added, which are used to mimic materials having defects, contaminations, and imaged under different doses. The dose variation is realized by adding noise on top of DRIT-generated images. The loss function was designed as follows:

$${{{\rm{Loss}}}}={\lambda }_{{{{\rm{MSE}}}}}{L}_{{{{\rm{MSE}}}}}(\inf,{{{\rm{gt}}}})+{{{\lambda }}_{{{{\rm{L}}}}}}_{1}{{L}_{{{{\rm{L}}}}}}_{1}(\inf,{{{\rm{gt}}}})$$

(7)

where inf represented the model output, gt represented the ground truth label of the input data, λ_MSE was set to 1 and λ_L1 to 0.5. The training procedures used the Adam optimizer with a batch size of 32. The ResNet-50 was trained for 3000 epochs, where the learning rate linearly increased from 0 to 10⁻³ at the first 60 epochs and linearly decreased from 10⁻³ to 0 from the 61-3000 epochs.

Evaluation metrics for structure inference model

The inference accuracy of the slip stacking structure inference model can be evaluated by ΔD, which measures the Euclidean distance between the slip vector inferred by the model (D_inf) and the ground truth slip vector (D_gt). For a large number of STEM images, we calculate ΔD for each image and show the overall inference accuracy by the box plots. In Fig. 3b, the central squared spots represent the mean. The color-shaded boxes represent the 5th and 95th percentiles, with the whiskers extending to 3 times the distance between the 5th and 95th percentiles. In Fig. 4b, the boxes indicate the 20th and 80th percentiles with the whiskers extending to 1 time the distance between the 20th and 80th percentiles. In Fig. 4c and Supplementary Fig. 21a, b, the boxes represent the 10th and 90th percentiles, with the whiskers extending to 1.5 times the distance between the 10th and 90th percentiles. In Supplementary Fig. 23c, the boxes represent the 10th and 90th percentiles, with the whiskers extending to 2 times the distance between the 10th and 90th percentiles. In Fig. 4f, the error bar represents the standard deviation of ∆D measured at three different positions of a pair of high and low SNR images. In addition, the inference accuracy of the twisted stacking structure inference model can be evaluated by |Δθ|, which measures the absolute difference between the twist angle inferred by the model (θ_inf) and the ground truth twist angle (θ_gt).

Strided pattern matching technique (SPMT)

An ADF-STEM image with thousands of atoms is analyzed by combining the structure inference model with SPMT. As shown in Supplementary Fig. 34, the initial image is first input, and then the STEM image is sampled at a predefined window size and stride length to obtain small patches reflecting local structural features. These patches are analyzed by the inference model to obtain the slip coordinates or twisted angles, and then these outputs are integrated to obtain a 2D mapping, which can show the subtle structural changes and the transition of stacking pattern in the entire image. There are two important parameters in SPMT, the first is the window size, which is set to 2–5 nm. As long as the limit scale of the model is not exceeded, the smaller the sampling size, the finer the local structural features reflected, and the easier to detect the structural variations between different regions. The second is the stride length. The stride length determines the resolution of the local analysis, and the length can be arbitrarily chosen according to the task requirements. As the stride length decreases, the number of patches in the ADF-STEM images and the corresponding total computation time increase exponentially. Therefore, we applied SPMT with two-levels of step sizes to first coarsely locate the region involving pattern transition interfaces and then accurately identify the detailed interface position, thus realizing a good balance between the recognition accuracy and power consumption. As shown in Supplementary Fig. 17, we first chose 1 nm as the initial coarse length to obtain a low-resolution mapping, which is sufficient to identify the interface region of interest. Then, for these regions, the step size is reduced to 1 Å which is close to the length of the chemical bond. Therefore, a high-resolution map of the interface region is obtained, and the interface is accurately located.

DFT calculations

DFT calculations were performed using local density approximation (LDA)⁶³ and implemented in the Vienna Ab Initio Simulation Package (VASP)^64,65. The cutoff energy of the plane wave is set as 400 eV. The convergence criterion for total energy was set to 1e⁻⁵ eV and atoms were relaxed until the Hellman-Feynman forces were less than 0.001 eV Å⁻¹. The vacuum layer was set to be larger than 10 Å. The k-mesh of 6 × 6 × 1 was adopted for ReS₂. Note that, only the vertical coordinates could be relaxed during the sliding.

Data availability

The experimental and simulation data for Trident strategy-enhanced DRIT model training and the DRIT-generated data for stacking structure inference model training are available on Zenodo (https://doi.org/10.5281/zenodo.11446947).

Code availability

Codes for DRIT model and stacking structure inference model are available on GitHub (https://github.com/dptech-corp/TED-Gen). Stacking structure inference APP is available on https://bohrium.dp.tech/apps/stacking-pattern-analyzer. The user’s manual is described in Supplementary Fig. 35.

References

Cao, Y. et al. Correlated insulator behaviour at half-filling in magic-angle graphene superlattices. Nature 556, 80–84 (2018).
Article ADS CAS PubMed MATH Google Scholar
Cao, Y. et al. Unconventional superconductivity in magic-angle graphene superlattices. Nature 556, 43–50 (2018).
Article ADS CAS PubMed MATH Google Scholar
Weston, A. et al. Interfacial ferroelectricity in marginally twisted 2D semiconductors. Nat. Nanotechnol. 17, 390–395 (2022).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Wang, X. et al. Interfacial ferroelectricity in rhombohedral-stacked bilayer transition metal dichalcogenides. Nat. Nanotechnol. 17, 367–371 (2022).
Article ADS CAS PubMed MATH Google Scholar
Huang, S. et al. Topologically protected helical states in minimally twisted bilayer graphene. Phys. Rev. Lett. 121, 037702 (2018).
Article ADS CAS PubMed Google Scholar
Xie, F., Song, Z., Lian, B. & Bernevig, B. A. Topology-bounded superfluid weight in twisted bilayer graphene. Phys. Rev. Lett. 124, 167002 (2020).
Article ADS CAS PubMed MATH Google Scholar
Bao, W. et al. Stacking-dependent band gap and quantum transport in trilayer graphene. Nat. Phys. 7, 948–952 (2011).
Article CAS MATH Google Scholar
Zibrov, A. A. et al. Emergent Dirac gullies and gully-symmetry-breaking quantum Hall states in ABA trilayer graphene. Phys. Rev. Lett. 121, 167601 (2018).
Article ADS CAS PubMed Google Scholar
Zhou, H., Xie, T., Taniguchi, T., Watanabe, K. & Young, A. F. Superconductivity in rhombohedral trilayer graphene. Nature 598, 434–438 (2021).
Article ADS CAS PubMed Google Scholar
Han, T. et al. Correlated insulator and Chern insulators in pentalayer rhombohedral-stacked graphene. Nat. Nanotechnol. 19, 181–187 (2024).
Article ADS CAS PubMed MATH Google Scholar
Song, T. et al. Switching 2D magnetic states via pressure tuning of layer stacking. Nat. Mater. 18, 1298–1302 (2019).
Article ADS CAS PubMed MATH Google Scholar
Han, X. et al. Atomically unveiling an atlas of polytypes in transition-metal trihalides. J. Am. Chem. Soc. 145, 3624–3635 (2023).
Article CAS PubMed MATH Google Scholar
Pakdel, S. et al. High-throughput computational stacking reveals emergent properties in natural van der Waals bilayers. Nat. Commun. 15, 932 (2024).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Zhao, X. et al. Edge segregated polymorphism in 2D molybdenum carbide. Adv. Mater. 31, 1808343 (2019).
Article Google Scholar
Wang, S. et al. Atomic-scale studies of overlapping grain boundaries between parallel and quasi-parallel grains in low-symmetry monolayer ReS₂. Matter 3, 2108–2123 (2020).
Article MATH Google Scholar
Zhao, X. et al. Unveiling atomic-scale moiré features and atomic reconstructions in high-angle commensurately twisted transition metal dichalcogenide homobilayers. Nano Lett. 21, 3262–3270 (2021).
Article ADS CAS PubMed MATH Google Scholar
Li, S. et al. Growth anisotropy and morphology evolution of line defects in monolayer MoS₂: atomic-level observation, large-scale statistics, and mechanism understanding. Small 20, 2303511 (2024).
Article CAS Google Scholar
Sasaki, T. et al. Performance of low-voltage STEM/TEM with delta corrector and cold field emission gun. Microscopy 59, 7–13 (2010).
MATH Google Scholar
Ennos, A. E. The origin of specimen contamination in the electron microscope. Br. J. Appl. Phys. 4, 101–106 (1953).
Article ADS MATH Google Scholar
Belianinov, A. et al. Identification of phases, symmetries and defects through local crystallography. Nat. Commun. 6, 7801 (2015).
Article ADS CAS PubMed MATH Google Scholar
Maksov, A. et al. Deep learning analysis of defect and phase evolution during electron beam-induced transformations in WS₂. npj Comput. Mater. 5, 12–19 (2019).
Article ADS Google Scholar
Guo, Y. et al. Defect detection in atomic-resolution images via unsupervised learning with translational invariance. npj Comput. Mater. 7, 180–188 (2021).
Article ADS MATH Google Scholar
Dan, J. et al. Learning motifs and their hierarchies in atomic resolution microscopy. Sci. Adv. 8, 1005–1014 (2022).
Article MATH Google Scholar
Ziatdinov, M. et al. Deep learning of atomically resolved scanning transmission electron microscopy images: chemical identification and tracking local transformations. ACS Nano 11, 12742–12752 (2017).
Article CAS PubMed MATH Google Scholar
Madsen, J. et al. A deep learning approach to identify local structures in atomic-resolution transmission electron microscopy images. Adv. Theory Simul. 1, 1800037 (2018).
Article MATH Google Scholar
Leist, C., He, M., Liu, X., Kaiser, U. & Qi, H. Deep-learning pipeline for statistical quantification of amorphous two-dimensional materials. ACS Nano 16, 20488–20496 (2022).
Article CAS PubMed Google Scholar
Lin, Y. et al. A multiscale deep-learning model for atom identification from low-signal-to-noise-ratio transmission electron microscopy Images. Small Sci. 3, 2300031 (2023).
Article ADS CAS MATH Google Scholar
Ni, H. et al. Quantifying atomically dispersed catalysts using deep learning assisted microscopy. Nano Lett. 23, 7442–7448 (2023).
Article ADS CAS PubMed MATH Google Scholar
Lin, R., Zhang, R., Wang, C., Yang, X. Q. & Xin, H. L. TEMImageNet training library and AtomSegNet deep-learning models for high-precision atom segmentation, localization, denoising, and deblurring of atomic-resolution images. Sci. Rep. 11, 5386 (2021).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Lee, K. et al. STEM image analysis based on deep learning: identification of vacancy defects and polymorphs of MoS₂. Nano Lett. 22, 4677–4685 (2022).
Article ADS CAS PubMed MATH Google Scholar
Yang, S. H. et al. Deep learning-assisted quantification of atomic dopants and defects in 2D materials. Adv. Sci. 8, 2101099 (2021).
Article CAS Google Scholar
Khan, A., Lee, C.-H., Huang, P. Y. & Clark, B. K. Leveraging generative adversarial networks to create realistic scanning transmission electron microscopy images. npj Comput. Mater. 9, 85–93 (2023).
Article ADS Google Scholar
Akers, S. et al. Rapid and flexible segmentation of electron microscopy data using few-shot machine learning. npj Comput. Mater. 7, 187–195 (2021).
Article ADS CAS MATH Google Scholar
Leitherer, A., Yeo, B. C., Liebscher, C. H. & Ghiringhelli, L. M. Automatic identification of crystal structures and interfaces via artificial-intelligence-based electron microscopy. npj Comput. Mater. 9, 179–189 (2023).
Article ADS CAS Google Scholar
Kaufmann, K. et al. Crystal symmetry determination in electron diffraction using machine learning. Science 367, 564–568 (2020).
Article ADS CAS PubMed MATH Google Scholar
Ziletti, A., Kumar, D., Scheffler, M. & Ghiringhelli, L. M. Insightful classification of crystal structures using deep learning. Nat. Commun. 9, 2775–2784 (2018).
Article ADS PubMed PubMed Central Google Scholar
Leitherer, A., Ziletti, A. & Ghiringhelli, L. M. Robust recognition and exploratory analysis of crystal structures via Bayesian deep learning. Nat. Commun. 12, 6234–6246 (2021).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Li, J. et al. Machine vision automated chiral molecule detection and classification in molecular imaging. J. Am. Chem. Soc. 143, 10177–10188 (2021).
Article CAS PubMed MATH Google Scholar
Hao, H. et al. Chiral stacking identification of two-dimensional triclinic crystals enabled by machine learning. ACS Nano 18, 13858–13865 (2024).
Article CAS PubMed Google Scholar
Munshi, J. et al. Disentangling multiple scattering with deep learning: application to strain mapping from electron diffraction patterns. npj Comput. Mater. 8, 254–268 (2022).
Article ADS CAS MATH Google Scholar
Xu, T. et al. Structural evolution of atomically thin 1T’‐MoTe₂ alloyed in chalcogen atmosphere. Small Struct. 3, 2200025 (2022).
Article CAS Google Scholar
Wang, S. et al. Detailed atomic reconstruction of extended line defects in monolayer MoS₂. ACS Nano 10, 5419–5430 (2016).
Article CAS PubMed MATH Google Scholar
Ophus, C., Ciston, J. & Nelson, C. T. Correcting nonlinear drift distortion of scanning probe and scanning transmission electron microscopies from image pairs with orthogonal scan directions. Ultramicroscopy 162, 1–9 (2016).
Article CAS PubMed Google Scholar
Rudnaya, M. E., Van den Broek, W., Doornbos, R. M. P., Mattheij, R. M. M. & Maubach, J. M. L. Defocus and twofold astigmatism correction in HAADF-STEM. Ultramicroscopy 111, 1043–1054 (2011).
Article CAS PubMed MATH Google Scholar
Lee, H. Y., Tseng, H. Y., Huang, J. B., Singh, M. & Yang, M. H. Diverse image-to-image translation via disentangled representations. Proc. Eur. Conf. Computer Vis. 11205, 36–52 (2018).
Google Scholar
Yang, H. et al. Unpaired brain MR-to-CT synthesis using a structure-constrained CycleGAN. Deep Learn. Med. Image Anal. Multimodal Learn. Clin. Decis. Support (DLMIA ML-CDS) 11045, 174–182 (2018).
Article MATH Google Scholar
Kirkland, E. J. Advanced Computing in Electron Microscopy. (Springer, 2020).
Stadelmann, P. A. EMS - a software package for electron diffraction analysis and HREM image simulation in materials science. Ultramicroscopy 21, 131–145 (1987).
Article CAS MATH Google Scholar
Kazmierczak, N. P. et al. Strain fields in twisted bilayer graphene. Nat. Mater. 20, 956–963 (2021).
Article ADS CAS PubMed MATH Google Scholar
Zachman, M. J. et al. Interferometric 4D-STEM for lattice distortion and interlayer spacing measurements of bilayer and trilayer 2D materials. Small 17, 2100388 (2021).
Article CAS MATH Google Scholar
Hu, P. et al. Lateral and vertical morphology engineering of low-symmetry, weakly-coupled 2D ReS₂. Adv. Funct. Mater. 33, 2210502 (2023).
Article CAS Google Scholar
Chen, Y. et al. Constructing slip stacking diversity in van der Waals homobilayers. Adv. Mater. 36, 2404734 (2024).
Article CAS Google Scholar
Hibino, H., Mizuno, S., Kageshima, H., Nagase, M. & Yamaguchi, H. Stacking domains of epitaxial few-layer graphene on SiC(0001). Phys. Rev. B 80, 085406 (2009).
Article ADS Google Scholar
Liu, L. et al. High-yield chemical vapor deposition growth of high-quality large-area AB-stacked bilayer graphene. ACS Nano 6, 8241–8249 (2012).
Article CAS PubMed PubMed Central MATH Google Scholar
Wang, S., Sawada, H., Allen, C. S., Kirkland, A. I. & Warner, J. H. Orientation dependent interlayer stacking structure in bilayer MoS₂ domains. Nanoscale 9, 13060–13068 (2017).
Article CAS PubMed Google Scholar
Zhang, X. et al. Transition metal dichalcogenides bilayer single crystals by reverse-flow chemical vapor epitaxy. Nat. Commun. 10, 598 (2019).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Egerton, R. F. Control of radiation damage in the TEM. Ultramicroscopy 127, 100–108 (2013).
Article CAS PubMed MATH Google Scholar
Meyer, J. C. et al. Accurate measurement of electron beam induced displacement cross sections for single-layer graphene. Phys. Rev. Lett. 108, 196102 (2012).
Article ADS PubMed Google Scholar
Xu, M. et al. Reconfiguring nucleation for CVD growth of twisted bilayer MoS₂ with a wide range of twist angles. Nat. Commun. 15, 562 (2024).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Yoo, H. et al. Atomic and electronic reconstruction at the van der Waals interface in twisted bilayer graphene. Nat. Mater. 18, 448–453 (2019).
Article ADS CAS PubMed MATH Google Scholar
Weston, A. et al. Atomic reconstruction in twisted bilayers of transition metal dichalcogenides. Nat. Nanotechnol. 15, 592–597 (2020).
Article ADS CAS PubMed MATH Google Scholar
Li, S. et al. Growth mechanism and atomic structure of group-IIA compound-promoted CVD-synthesized monolayer transition metal dichalcogenides. Nanoscale 13, 13030–13041 (2021).
Article CAS PubMed Google Scholar
Ceperley, D. M. & Alder, B. J. Ground state of the electron gas by a stochastic method. Phys. Rev. Lett. 45, 566–569 (1980).
Article ADS CAS MATH Google Scholar
Kresse, G. & Furthmüller, J. Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Phys. Rev. B 54, 11169–11186 (1996).
Article ADS CAS MATH Google Scholar
Kresse, G. & Furthmüller, J. Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Comput. Mater. Sci. 6, 15–50 (1996).
Article CAS MATH Google Scholar

Download references

Acknowledgements

S.W. acknowledges support from the National Natural Science Foundation of China (52222201, 52172032, 22494464013), Young Elite Scientist Sponsorship Program by CAST (YESS20200222), Hunan Natural Science Foundation (2022JJ20044), Shenzhen Science and Technology Innovation Commission Project (KQTD20221101115627004), and National University of Defense Technology (ZZCX-ZZGC-01-07). J.Z. acknowledges support from the Ministry of Science and Technology of China (2022YFA1203302, 2022YFA1203304 and 2018YFA0703502), the National Natural Science Foundation of China (Grant Nos. 52021006, 22494464010), the Strategic Priority Research Program of CAS (XDB36030100), the Beijing National Laboratory for Molecular Sciences (BNLMS-CXTD-202001) and the Shenzhen Science and Technology Innovation Commission (KQTD20221101115627004). F.O. acknowledges support from the Key Project of the Natural Science Program of Xinjiang Uygur Autonomous Region(Grant No. 2023D01D03).

Author information

These authors contributed equally: Wenqiang Huang, Yucheng Jin, Zhemin Li.

Authors and Affiliations

School of Advanced Materials, Peking University Shenzhen Graduate School, Shenzhen, China
Wenqiang Huang, Jin Zhang & Shanshan Wang
Department of Materials Science and Engineering, College of Aerospace Science and Engineering, National University of Defense Technology, Changsha, China
Wenqiang Huang, Yun Chen, Zheng Luo & Shanshan Wang
School of Physics, Central South University, Changsha, China
Wenqiang Huang & Fangping Ouyang
State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, China
Yucheng Jin & Jun Cheng
Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen, China
Yucheng Jin & Jun Cheng
Institute of Artificial Intelligence, Xiamen University, Xiamen, China
Yucheng Jin & Jun Cheng
College of Science, National University of Defense Technology, Changsha, China
Zhemin Li & Shen Zhou
DP Technology, Beijing, China
Lin Yao, Zhifeng Gao & Linfeng Zhang
State Key Laboratory of Nonlinear Mechanics, Institute of Mechanics, Chinese Academy of Sciences, Beijing, China
Jinguo Lin & Feng Liu
AI for Science Institute, Beijing, China
Linfeng Zhang
Guangdong Provincial Key Laboratory of Nano-Micro Materials Research, Peking University Shenzhen Graduate School, Shenzhen, China
Shanshan Wang

Authors

Wenqiang Huang
View author publications
You can also search for this author inPubMed Google Scholar
Yucheng Jin
View author publications
You can also search for this author inPubMed Google Scholar
Zhemin Li
View author publications
You can also search for this author inPubMed Google Scholar
Lin Yao
View author publications
You can also search for this author inPubMed Google Scholar
Yun Chen
View author publications
You can also search for this author inPubMed Google Scholar
Zheng Luo
View author publications
You can also search for this author inPubMed Google Scholar
Shen Zhou
View author publications
You can also search for this author inPubMed Google Scholar
Jinguo Lin
View author publications
You can also search for this author inPubMed Google Scholar
Feng Liu
View author publications
You can also search for this author inPubMed Google Scholar
Zhifeng Gao
View author publications
You can also search for this author inPubMed Google Scholar
Jun Cheng
View author publications
You can also search for this author inPubMed Google Scholar
Linfeng Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Fangping Ouyang
View author publications
You can also search for this author inPubMed Google Scholar
Jin Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Shanshan Wang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

S.W. and J.Z. initiated the project and generated the experimental protocols. W.H., Y.J., Z.G., and L.Y. wrote the code. S.W. prepared the samples and captured the experimental STEM images. J.L. and F.L. conducted DFT calculations. Y.C., Z.L., S.Z., J.C., Zheng Luo, and F.O. discussed the work and gave suggestions. All authors contributed to the data analysis, manuscript writing, and revision of the manuscript.

Corresponding authors

Correspondence to Lin Yao, Fangping Ouyang, Jin Zhang or Shanshan Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Transparent Peer Review file

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Huang, W., Jin, Y., Li, Z. et al. Auto-resolving the atomic structure at van der Waals interfaces using a generative model. Nat Commun 16, 2927 (2025). https://doi.org/10.1038/s41467-025-58160-3

Download citation

Received: 12 February 2025
Accepted: 13 March 2025
Published: 25 March 2025
DOI: https://doi.org/10.1038/s41467-025-58160-3