This application claims benefit and priority to U.S. provisional application No. 63/015,101, filed 24/4/2020.
Disclosure of Invention
The invention provides a method and a system for classifying defects in a wafer by using a wafer defect image based on a deep learning network. Implementations herein use synergy between multiple modes of wafer defect images to make classification decisions. Furthermore, by adding a mixture of patterns, information can be obtained from different sources, such as: color images, intra-crack imaging (ICI) images, black and white images, etc., to classify the defect images. In addition to the mixture of patterns, a reference image (e.g., gold die image) may be used for each pattern. The advantage of providing a reference image for each mode image is to focus on the defect itself rather than the associated underlying lithography of the defect image. Furthermore, the reference images may be provided to a training process of the deep learning model, which may significantly reduce the number of labeled images and the training time required for the deep learning model to converge (i.e., as the entire data set is passed forward and backward through the deep learning neural network).
Embodiments herein may use Directed Acyclic Graphs (DAGs) as a combination of deep learning models, and each deep learning may use a defect wafer image to address different aspects of a problem or different forms of defects in a wafer. Further, the created DAG may have any number of models and multiple different images for each deep learning model. Further, the post-processing decision module may be configured to combine parameters such as the two aspects of the defect survey image and the result tags of the defect survey image, the values from each deep learning model of the DAG, and metrology information (metadata) of the defect or parameters previously collected in the scanner machine. Based on the deep learning network, a DAG including a deep learning model can be used to accurately classify the wafer defects using the wafer defect image.
The features disclosed in the present invention facilitate accurate detection and classification of defects in wafers during manufacturing by analyzing multiple patterns of wafer defect images.
In one aspect, a computer-implemented method of classifying and inspecting defects in a semiconductor wafer comprises: providing one or more imaging units; providing a computing unit; receiving a plurality of images taken from one or more dies on a semiconductor wafer inspected by one or more imaging units, wherein the plurality of images are captured using a plurality of imaging modes; providing one or more Machine Learning (ML) models, one of the plurality of ML models associated with at least one computer processor, a database, and a memory associated with a computing unit; identifying and classifying one or more defects present in a semiconductor wafer into one or more defect classes from a plurality of ML models, a computer processor, wherein the plurality of ML models are configured in a Directed Acyclic Graph (DAG) architecture, wherein each node in the DAG architecture represents one ML model, wherein the one or more ML models are configured as a root node in the DAG architecture; the plurality of ML models are configured to classify one or more defects on one or more dies in the semiconductor wafer, wherein the training comprises: providing a plurality of marker images and a plurality of reference images of a semiconductor wafer stored in a database from a plurality of ML models to the one or more ML models; configuring each ML model from the plurality of ML models to classify the plurality of marker images into one or more defect classes using a corresponding reference image from the plurality of reference images; storing the one or more defect categories; inspecting one or more dies on a semiconductor wafer for defects by imaging the one or more dies; attempting to match the image of the one or more wafers to any one or more of the one or more defect categories; if there is a match between the one or more wafers and the one or more defect categories, classifying the one or more matched wafers as defective and transmitting an identification of the one or more defective wafers as defect reject.
In another aspect, the one or more ML models have a plurality of images from a plurality of imaging modes and a plurality of marker images belonging to the imaging modes. Further, each of the plurality of ML models is one of a supervised model, a semi-supervised model, and an unsupervised model.
In a further aspect, the plurality of modes includes at least one of: x-ray imaging, internal flaw imaging (ICI), grayscale imaging, black and white imaging, and color imaging. Further, the plurality of ML models are deep learning models.
In another aspect, the plurality of marker images includes a label associated with the one or more defect categories, wherein the plurality of marker images are generated using a label model.
In additional aspects, the computing unit of the present invention includes one or more processors and memory configured to perform the above-described method steps.
In one aspect, a method of classifying defects in a semiconductor wafer comprises: capturing a plurality of images of a semiconductor wafer inspected by one or more imaging units, wherein the plurality of images are captured using a plurality of imaging modes; providing the plurality of images from the plurality of ML models to one or more and their learning (ML) models to identify and classify one or more defects in the semiconductor wafer into one or more defect classes, wherein the plurality of ML models are configured in a Directed Acyclic Graph (DAG) architecture, wherein each node in the DAG architecture represents one ML model, wherein the one or more ML models are configured as root nodes in the DAG architecture; wherein the plurality of ML models are trained to classify one or more defects in the semiconductor wafer, and wherein the training comprises: providing a plurality of marking images and a plurality of reference images of the semiconductor wafer stored in a database from the plurality of ML models to one or more ML models; and configuring each ML model from the plurality of ML models to classify the plurality of label images into one or more defect classes using a corresponding reference image from the plurality of reference images.
In another aspect, the one or more ML models have a plurality of images from a plurality of imaging modes and a plurality of marker images belonging to one imaging mode. Further, each of the plurality of ML models is one of a supervised model, a semi-supervised model, and an unsupervised model. The plurality of modes includes at least one of: x-ray imaging, internal flaw imaging (ICI), grayscale imaging, black and white imaging, and color imaging. Each of the plurality of ML models is a deep learning model.
In another aspect, the plurality of marking images includes labels associated with the one or more defect categories, wherein the plurality of marking images are generated using historical images of the semiconductor wafer. Features extracted from the plurality of patterns are combined using one of a late-stage fusion technique, an early-stage fusion technique, or a hybrid fusion technique. Further comprising post-processing, wherein the post-processing comprises accurately classifying the plurality of images into one or more defect classes using each classification information from the plurality of ML models.
In one aspect, a system for classifying and inspecting defects in a semiconductor wafer comprises: one or more imaging units configured to capture a plurality of images of one or more dies on a semiconductor wafer inspected by the one or more imaging units, wherein the plurality of images are captured using a plurality of imaging modes; a computing unit comprising at least a computer processor, a database, and a memory, and configured to: providing a plurality of images from a plurality of Machine Learning (ML) models to one or more ML models to identify and classify more defects in one or more dies on one or more Machine Learning (ML) semiconductor wafers into one or more defect classes, wherein the plurality of ML models are configured in a Directed Acyclic Graph (DAG) architecture, wherein each node in the DAG architecture represents an ML model, wherein the one or more ML models are configured as root nodes in the DAG architecture, the plurality of ML models are configured to be trained to classify one or more defects on one or more dies in a semiconductor wafer, wherein the computing unit is configured to: providing the plurality of marker images and the plurality of reference images of the semiconductor wafer stored in the database to one or more ML models from the plurality of ML models;
configuring each ML model from the plurality of ML models to classify the plurality of marker images into one or more defect classes using corresponding reference images from the plurality of reference images; then, based on the existence of a match of the one or more wafers with the one or more defect categories, the defect categories are stored separately and the one or more wafers are accepted or rejected on the premise of inspection.
In another aspect, the one or more imaging units include an Automatic Optical Inspection (AOI) device, an automatic X-ray inspection (AXI) device, a joint test work group (JTAG) device, and an in-circuit test (ICT) device. Further, the computing unit receives a plurality of label images including labels associated with the one or more defect categories from a label model, wherein the label model generates the plurality of label images using historical images of the semiconductor wafer.
In yet another aspect, the features extracted from the plurality of patterns are combined using one of a late fusion technique, an early fusion technique, or a hybrid fusion technique. The computing unit is further configured to post-process the output of the plurality of ML models, wherein the computing unit accurately classifies the plurality of images into one or more defect categories using the classification information from each of the plurality of ML models.
Detailed Description
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. It should be appreciated by those skilled in the art that the conception and specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present invention.
The novel features which are believed to be characteristic of the invention, both as to its organization and method of operation, together with further objects and advantages will be better understood from the following description when considered in connection with the accompanying figures. It is to be expressly understood, however, that each of the figures is provided for the purpose of illustration and description only and is not intended as a definition of the limits of the present invention.
Fig. 1 illustrates a block diagram of a system 100 for classifying defects in a wafer based on a deep learning network wafer defect image, according to some embodiments of the invention.
Throughout the present disclosure, the term "wafer" generally refers to a substrate formed of a semiconductor or non-semiconductor material. For example, semiconductor or non-semiconductor materials may include, but are not limited to, monocrystalline silicon, gallium arsenide, indium phosphide, and the like. The wafer may include one or more layers that may include, for example, but not limited to, resist, node material, conductive material, semiconductor material, and the like. For example, one or more layers formed on the wafer may be patterned or unpatterned. For example, the wafer may include a plurality of dies, each die having repeatable pattern features. The formation and processing of these material layers may result in a completed device. Further, as used herein, the terms "surface defect" or "defect" refer to both defects (e.g., particles) that are located entirely above the top surface of the wafer and defects that are located partially below the top surface of the wafer or entirely below the top surface of the wafer. Therefore, classification of defects is particularly useful for semiconductor materials such as wafers and materials formed on wafers. In addition, distinguishing between surface and subsurface defects may be particularly important for bare silicon wafers (bare silicon wafers), silicon-on-insulator (SOI) films, strained silicon films, and dielectric films. Embodiments herein may be used to inspect wafers containing silicon or having silicon-containing layers formed thereon, such as silicon carbide, carbon-doped silicon dioxide, silicon-on-insulator (SOI), strained silicon, silicon-containing dielectric films, and the like.
In the embodiment of fig. 1, the system 100 includes an imaging apparatus 102 and an electronic device 104. The imaging device 102 is associated with an electronic apparatus 104 via a communication network 106. The communication network 106 may be a limited network or a wireless network. In one embodiment, the imaging device 102 may be, but is not limited to, at least one of an Automatic Optical Inspection (AOI) device, an automatic X-ray inspection (AXI) device, a joint test work group (JTAG) device, an in-circuit test (ICT) device, and the like. The imaging device 102 includes, but is not limited to, at least one of a light source 108, a camera lens 110, a defect detection module 112, and an imaging storage unit 126. For example, the defect detection module 112 associated with the imaging device 102 may detect a plurality of surface feature defects of a wafer, such as, but not limited to, at least one of silicon junctions (i.e., bumps), scratches, stains, dimensional defects (e.g., open circuits, short circuits, solder thinning, etc.). In addition, the defect detection module 112 may also detect incorrect components, missing components, and misplaced components because the imaging device 102 is capable of performing all visual inspections.
Further, the electronic device 104 may be, but is not limited to, at least one of a mobile phone, a smart phone, a tablet (tablet), a handheld device, a tablet (phablet), a laptop, a computer, a Personal Digital Assistant (PDA), a wearable computing device, a virtual/augmented reality device, an internet of things device (IoT device), and the like. The electronic device 104 further includes a storage unit 116, a processor 118, and an input/output (I/O) interface 120. Further, the electronic device 104 includes a deep learning module 122. The deep learning module 122 causes the electronic device 104 to classify defects in the wafer using the wafer defect image obtained from the imaging device 102. The electronic device 104 may further include an application management framework for classifying defects in the wafer using a deep learning network. The application management framework can comprise different modules and sub-modules so as to perform the operation of classifying the wafer defects by using the wafer defect images based on the deep learning network. Additionally, the modules and sub-modules may include at least one or both of software modules or hardware modules.
Thus, embodiments described herein are configured for image-based wafer process control and throughput enhancement. For example, one embodiment herein relates to a system and method for classifying defects in a wafer using wafer defect images based on a deep learning network.
In one implementation, the imaging device 102 may be configured to capture images of a wafer placed therein. For example, the images may include, for example, at least one of inspection images, optical or electron beam images, wafer inspection images, optical and SEM-based defect inspection images, simulation images, clips from a design layout, and the like. Further, the imaging device 102 may then be configured to store the captured image in an image storage unit 126 associated with the imaging device 102. In one embodiment, the electronic device 104 communicatively coupled to the imaging apparatus 102 may be configured to retrieve images stored in the image storage unit 126 associated with the imaging apparatus 102. For example, the images may include black and white images, color images, intra-crack imaging (ICI) images, images previously scanned using the imaging device 102 (e.g., AOI machine, etc.), images from the imaging memory unit 126 or several memory units (not shown), images obtained in real-time from the imaging device 102 (e.g., AOI machine, etc.), and the like. The electronic device 104 is then configured to load the wafer area of the defect-free inspection image in at least one wafer in the reference image corresponding to at least one of the black and white reference image, the color reference image, the ICI reference image representing the same scan, from an external database (not shown) or from a memory unit 116 associated with the electronic device 104. Further, the electronic device 104 is configured to provide the reference image and the wafer image with the associated pattern to the depth learning module 122. In one aspect of the invention, multiple deep learning models or deep learning classifiers may be trained using different types of classification of defects in a wafer. The plurality of deep learning models or deep learning classifiers may be, but are not limited to, at least one of Convolutional Neural Networks (CNNs) (e.g., LeNet, AlexNet, VGGNet, google net, ResNet, etc.), Recurrent Neural Networks (RNNs), generative confrontation networks (GANs), random forest algorithms, self-encoders, and the like. The goal of training several deep learning models is that each model can be created to handle the synergy of the concentrated patterns of defects. Therefore, several deep learning models can be established, all defects are layered according to similarity and dissimilarity, and training and classification processes are distributed. In another embodiment, to shorten the training process and the number of classification images, a reference image for each pattern image may be added to the architecture of each deep learning model. The reference images in the training process may allow for faster adjustment or training of the depth learning internal parameters by providing information about the internal relationships between the inspected image and the reference images to the depth learning module 122. In addition, if defects occur in different wafers, the deep learning module 122 trained to classify specific defects of a single wafer may also dynamically classify the trained defects. Thus, the training process may discard common events, such as underlying lithography, etc., and may focus on actual defects.
In one embodiment, the deep learning models may be connected in a parallel architecture or a series architecture. Further, the electronic device 104 may be configured to generate classification decisions for the wafer images using a Directed Acyclic Graph (DAG) architecture of a plurality of deep learning models. For example, the trained plurality of deep learning models may be invoked from the deep learning module 122 and then may be connected in a Directed Acyclic Graph (DAG) architecture for classification processing of wafer defects. Further, the electronic device 104 may be configured to save the classified wafer image including the relevant catalog and metadata results (i.e., defect metadata) in an external database or storage unit 116 associated with the electronic device 104.
In another embodiment, the electronic device 104 may be configured to load previously computed and stored defect metadata from an external database or storage unit 116 associated with the electronic device 104. For example, the metadata includes different characteristics, but is not limited to, the size of the defect, a histogram of the defect, a maximum color or grayscale value of the defect, a minimum color or grayscale value of the defect, and the like. If the defect metadata is not stored, the electronic device 104 may be configured to calculate a feature representative of the defect metadata. The electronic device 104 may then be configured to provide the trained deep learning model with the inspected images, the reference images, and the defect metadata (i.e., metadata characteristics of the defect). Accordingly, the electronic device 104 may be configured to generate classification decisions for wafer images using a Directed Acyclic Graph (DAG) architecture of a plurality of deep learning models. In addition, the electronic device 104 may be configured to store the classified wafer image including the associated metadata results (i.e., defect metadata) in the storage unit 116 or an external database associated with the electronic device 104.
In addition, the image and defect metadata may also be stored in an external database (not shown). For example, external memory may be used for the training process of the deep learning model/classifier. As an embodiment, the image stored in the external database (or the image storage unit 126) may be a black and white image, a color image, an ICI image, an image previously scanned by an AOI device, an image including a wafer defect, a false event, and a nuisance defect, etc. The images may be tagged prior to storage in an external database (or image storage unit 126). For each defect found in an image stored in the external database (or image storage unit 126), a set of metadata features extracted from the defect image is stored in the external database (or image storage unit 126). Metadata defect features may be provided by a user or by AOI scanner results (or metadata defect features may be created for data retrieval by a deep learning classifier). In addition, reference images (e.g., golden wafers) including color reference images, black and white reference images, and/or ICI reference images may also be stored in an external database. The reference image is an image of the same wafer. The external database may also be used to perform the training process of the deep learning model.
Embodiments herein use synergy between the concentrated patterns of wafer defect images to make classification decisions. Furthermore, by adding a mix of patterns, information can be obtained from different sources such as color images, ICI, black and white images, etc. to classify defective images. In addition to the blending of patterns, a reference image (e.g., golden wafer image) may be used for each pattern. The advantage of providing a reference image for each mode image is to focus on the defect itself rather than the associated underlying lithography of the defect image. This approach saves processing power, memory utilization, and time. Furthermore, the reference images provided to the training process of the deep learning model may significantly reduce the number of labeled images and training time (i.e., as a complete data set is passed forward and backward through the deep learning neural network) required for fusion of the deep learning model.
FIG. 2 illustrates a block diagram of a multi-modal late-fusion deep learning model that may be used as one of the deep learning models for classifying defects in a wafer using wafer defect images, according to some embodiments of the invention.
In one embodiment, the electronic device 104 includes a multi-modal Convolutional Neural Network (CNN) configured to integrate images acquired by different image sensors in a single forward pass. A deep learning model, such as a multimodal late-fusion deep learning model, may account for two sensor images, e.g., an ICI image using a first deep learning model and a color image using a second deep learning model. Further, as shown in fig. 2, the multi-modal CNN model includes a CNN model for separately encoding a color image and an ICI image and combining the decisions of both. The trained multi-modal post-fusion deep learning model may be used to process each mode to allow decisions to be made separately for each mode. Finally, a central classification layer may provide common decisions based on different modes.
FIG. 3 illustrates a block diagram of a multi-modal hybrid fusion deep learning model that may be used as one of the deep learning models for classifying defects in a wafer using wafer defect images, according to some embodiments of the invention.
The multi-modal CNN model, e.g., the multi-modal hybrid fusion deep learning model, may include a first CNN model for encoding color images, a second CNN model for encoding ICI images, and a third CNN model for jointly representing color and ICI-defective images. The third/last CNN model may learn the inter-model relationship between color images and ICI images before making classification decisions.
Fig. 4 illustrates a block diagram of a multi-modal early fusion deep learning model that may be used as one of the deep learning models for classifying defects in a wafer using wafer defect images, according to some embodiments of the invention.
The multi-modal early fusion depth learning model may include a CNN model for jointly representing color defect images and ICI defect images by simultaneously processing joint feature points in a single multi-modal image.
FIG. 5a illustrates a schematic diagram of a DAG topology using a series of deep learning models, according to some embodiments of the invention.
As shown in fig. 5a, a plurality of deep learning models may be connected as a multi-tree (polytree), which may be a Directed Acyclic Graph (DAG) of the deep learning models, and an underlying undirected graph may be a tree, as shown in fig. 5a, a multi-mode hybrid fused deep learning model, a multi-mode early fused deep learning model, a single input image deep learning model, an auto-encoder with one or two input images, and/or a generative confrontation network (GAN) deep learning model. The DAG may include a unique topological order, and each deep learning model may be located on a node of the DAG. Furthermore, each node may be directly connected to one or more previous nodes and then to one or more nodes. Further, the result tags of each deep learning model define flow paths in the DAG. For example, as shown in FIG. 5a, the resulting image "tag 1" in "model 1" will continue to be evaluated in "model 3".
As an example, the result labels for each model are described in FIG. 5b in view of that model. The result label of "model 1" label 1: a "may have a probability value of 0.9, while" tag 1: the probability value for B "may be 0.1. likewise, the result label for" model 3 "label 2: the probability value for B "may be 0.2," tag 3: the probability value for B "may be 0.7," tag 3: the probability value for C "may be 0.1. Further, the result label "label 5 of" model 5 ": the probability value of a "is 0.1," tag 5: b "has a probability value of 0.1," tag 5: the probability value for C "is 0.2," tag 5: the probability value for D "is 0.6. Each deep learning model in the DAG may be unique and may be designed to handle a particular portion of the classification problem. For example, one deep learning model in the DAG can be a ResNet model, another can be a GoogleNet model, and yet another can be a multi-modal deep learning model. At the end of the DAG path, each image may be evaluated in a post-processing module (as shown in FIG. 5 a), where a decision may be made based on the results of the deep learning model interacting with the image.
Fig. 6a illustrates a flow diagram of a method 600a for classifying defects in a wafer using wafer defect images based on a deep learning network, according to some embodiments of the present application.
At block 601, an image of the wafer is captured by the imaging device 102. At block 602, the captured image is stored by the imaging device 102 (FIG. 1) in the image storage unit 126 (FIG. 1) associated with the imaging device 102. At block 603, the image stored in the image storage unit 126 associated with the imaging device 102 is retrieved by the electronic apparatus 104 (FIG. 1). At block 604, the electronic device 104 receives at least one reference image corresponding to at least one black and white reference image, color reference image, ICI reference image, which represents the same area of the wafer scanned with the inspected image having no defects in the wafer. At block 605, the electronic device 104 uses a trained plurality of depth learning models/classifiers with associated expected pattern images from the depth learning module 122 of the electronic device 104. At block 606, the trained plurality of deep learning models are connected by the electronic device 104 in a Directed Acyclic Graph (DAG) architecture for classification processing of wafer image defects. At block 607, a classification decision for the wafer image is generated by the electronic device 104 using a Directed Acyclic Graph (DAG) framework of the plurality of deep learning models. Finally, at block 608, the classified wafer image including the associated metadata results (i.e., defect metadata) is stored by the electronic device 104 in an external database or storage unit 116 of the electronic device 104.
Fig. 6b illustrates a flow of a method 600b for computing features representing defect metadata for a wafer defect image without storing the defect metadata in the electronic device 104 according to some embodiments of the present application.
At block 611, the electronic device 104 receives previously computed and stored defect metadata from an external database or storage unit 116 of the electronic device 104. For example, the metadata includes different characteristics of the defect, but is not limited to, the size of the defect, a histogram of the defect, a maximum color or grayscale value of the defect, a minimum color or grayscale value of the defect, and the like. At block 612, if defect metadata is not stored, a characteristic representing the defect metadata is computed by the electronic device 104.
Embodiments herein may utilize Directed Acyclic Graphs (DAGs) as a combination of deep learning models, and each deep learning may use a defect wafer image to address different aspects of a problem or different forms of defects in a wafer. Further, the DAG may create multiple different images (e.g., six images) with any number of models, each deep learning model. Further, the post-processing decision module may be configured to combine parameters such as the two aspects of the defect survey image and the result labels of the defect survey image, the values from each deep learning model of the DAG, and the defects or metrology information (metadata) of the defects previously collected in the scanner. Based on the deep learning network, a DAG including a deep learning model can be used to accurately classify the wafer defects using the wafer defect image.
With respect to the use of substantially any plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application. Various singular/plural permutations may be expressly set forth herein for the sake of clarity.
It will be understood by those within the art that, in general, terms used herein, are generally "open" terms (e.g., the term "including" should be interpreted as "including but not limited to," the term "having" should be interpreted as "having at least," the term "includes" should be interpreted as "includes but is not limited to," etc.). It will be further understood by those within the art that the specific numbers of claims referred to are intended. For example, as an aid to understanding, the detailed description may include usage of the introductory phrases "at least one" and "one or more" to introduce a claim. However, the use of such phrases should not be construed to limit any particular claim containing a claim recitation by the indefinite articles "a" or "an" to inventions containing only one such recitation, even when the same claim includes the introductory phrases "one or more" or "at least one" and indefinite articles such as "a" or "an" (e.g., "a" or "an" should typically be interpreted to mean "one or more" or "at least one"); the same holds true for the use of definite articles used to introduce claims. Furthermore, even if a specific number of an claim is explicitly recited, those skilled in the art will recognize that such recitation should typically be interpreted to mean at least the recited number (e.g., "two recitations," without other modifiers, typically means at least two recitations, or two or more recitations).
While various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are for purposes of illustration and not limitation, with the true scope and spirit being indicated by the following detailed description.
Reference numerals
| Reference numerals
|
DETAILED DESCRIPTIONS
|
| 100
|
System for controlling a power supply
|
| 102
|
Image forming apparatus
|
| 104
|
Electronic device
|
| 106
|
Communication network
|
| 108
|
Light source
|
| 110
|
Camera lens
|
| 112
|
Defect detection module
|
| 116
|
Memory cell
|
| 118
|
Processor with a memory having a plurality of memory cells
|
| 120
|
I/O interface
|
| 122
|
Deep learning module
|
| 126
|
Image storage unit |