CN119180988A

CN119180988A - A visual recognition method and device based on computer processing

Info

Publication number: CN119180988A
Application number: CN202411219159.2A
Authority: CN
Inventors: 孙小霞
Original assignee: Jingdezhen University
Current assignee: Jingdezhen University
Priority date: 2024-09-02
Filing date: 2024-09-02
Publication date: 2024-12-24

Abstract

The invention provides a visual identification method and a visual identification device based on computer processing, wherein the method comprises the steps of S1, collecting picture data based on a wafer generation line, preprocessing and label judging the collected picture data to obtain a wafer defect detection data set, S2, constructing a feature extraction and feature enhancement network aiming at the wafer defect detection data set obtained in the S1 to obtain a multi-level feature map, S3, designing a wafer defect detection model based on the multi-level feature map obtained in the S2 to obtain a semiconductor wafer defect detection model, S4, constructing a training strategy combining with antagonism collaborative learning based on the semiconductor wafer defect detection model obtained in the S3, training to obtain a final wafer defect detection model, S5, deploying the wafer defect detection model obtained in the S4 on a production process equipment visual module, and detecting and feeding back defect problems in real time. The invention has the following beneficial effects that the accuracy is maintained and the detection accuracy and the robustness are obviously improved.

Description

Visual recognition method and device for computer processing

Technical Field

The invention belongs to the field of computer vision, and particularly relates to a multitasking and collaborative visual detection method for defects of a semiconductor wafer.

Background

With the rapid development of semiconductor technology, the types of defects occurring in the wafer fabrication process are becoming more and more complex and diverse. These defects not only affect the yield of the wafer, but also have profound effects on the performance and reliability of the final product. Accurate defect detection is a critical element in order to ensure high quality wafer fabrication. However, with the continuous progress of the process technology, the conventional inspection method is faced with small and complex defects, which gradually exposes the problem that it is difficult to meet the modern manufacturing requirements. Therefore, there is an urgent need to develop more advanced inspection technologies to cope with the increasing inspection demands and to ensure the production efficiency and the product quality.

Existing PCB semiconductor wafer defect detection mainly covers the following three methods, namely a template matching-based method, an image segmentation-based method and a machine learning-based method:

(1) Template matching-based method:

the method detects defects by pixel-level comparison of the wafer image to be inspected with a standard defect-free wafer image. Although simple to implement, there are the following disadvantages:

a) The requirement on image alignment is extremely high, and small deviation can lead to false detection;

b) The ability to detect non-repetitive defects is poor;

c) The calculation complexity is high, and the real-time requirement of online detection is difficult to meet.

(2) Image segmentation-based method:

The method firstly segments the wafer image and then analyzes the segmented region characteristics to identify defects. Its major drawbacks include:

a) The robustness of the segmentation algorithm is insufficient, and the segmentation algorithm is easily influenced by image noise and illumination variation;

b) The detection effect on micro defects and low-contrast defects is poor;

c) It is difficult to effectively distinguish defects from normal structural changes.

(3) A machine learning based method:

Such methods utilize traditional machine learning algorithms (e.g., SVM, random forest, etc.) to classify the extracted image features. Although improved over the former two methods, the following problems remain:

a) The feature engineering relies on manual design, so that complex defect features are difficult to comprehensively capture;

b) The generalization capability is limited, and the adaptability to the type of the defects which are not found is poor;

c) It is difficult to achieve both high accuracy and high efficiency detection.

Aiming at the limitations of the existing semiconductor wafer defect detection methods, the problems of insufficient precision, low efficiency, weak generalization capability and the like can be seen when the defects of the complex and diversified wafers are processed. Particularly, in the prior process, the traditional method is difficult to cope with increasingly miniaturized and diversified defect types, and cannot meet the requirements of modern semiconductor manufacturing on high-precision and high-efficiency defect detection. Therefore, there is a need to develop a new semiconductor wafer defect detection method to overcome these drawbacks and realize high-precision, high-efficiency and powerful semiconductor wafer defect detection.

Disclosure of Invention

In order to solve the problems, the invention provides a visual recognition method for computer processing, which effectively realizes high-precision and high-efficiency wafer defect detection, and particularly shows strong adaptability and robustness in the presence of complex and diversified defect types.

A visual recognition method for computer processing, the method comprising:

S1, acquiring original semiconductor wafer image data based on a real semiconductor wafer generation line, and cutting, denoising and data enhancement data preprocessing methods for an acquired original image data design center to ensure smoothness of image data, and performing label cutting to obtain a complete wafer defect detection data set;

S2, constructing a dynamic multi-scale feature extraction and feature enhancement network of the wafer defect image according to the wafer defect detection data set obtained in the S1, and obtaining a multi-level feature map containing local details and global semantic information of the wafer defect image;

S3, designing a multitasking wafer defect detection model based on a multi-stage feature map of the wafer defects to obtain classification, regression and circular boundary prediction results of the wafer defects, thereby realizing visual detection of the semiconductor wafer defects;

s4, constructing a training strategy combining the antagonism collaborative learning based on the semiconductor wafer defect detection model, and performing multistage progressive training to obtain a final wafer defect detection model;

And S5, deploying a model, namely deploying the wafer defect detection model obtained in the step S4 on a visual module of process equipment for producing the semiconductor wafer, and detecting and feeding back whether the produced semiconductor wafer has defect or not in real time.

The visual recognition method as described above, the collecting and manufacturing process of the data set in S1 includes:

(1) Raw data acquisition

The source and acquisition method of the semiconductor wafer defect image data is that the original data acquisition is carried out by installing high-precision industrial cameras on a semiconductor wafer production line. An ace series industrial camera (model: acA4112-20 um) from Basler corporation, germany was used, which had a4112 x 3008 pixel resolution, up to 20 frames/sec acquisition speed. The camera is mounted above the wafer transport track and cooperates with the light source system to ensure that a clear image of the wafer surface is captured.

To ensure image quality, an LED ring light source (model: CCS LDR2-70SW2-LA 1) is used to provide uniform and stable illumination conditions. In the image acquisition process, triggering of the camera and the light source is precisely controlled by a PLC (programmable logic controller) so as to ensure that each wafer can be completely captured.

The raw image data is saved in a 16-bit TIFF format, with 65536 gray levels per pixel, to ensure high dynamic range and detail retention of the image. The file size of each image is about 24MB, and the naming convention is "wafer_ YYYYMMDD _ HHMMSS _sequence number. Tiff", wherein YYYYMMDD represents the date, HHMMSS represents the time, and the sequence number is the sequential number collected on the current day.

Data diversity and representativeness analysis to ensure diversity and representativeness of data sets, wafer samples were collected from different production lots, from different process stages. The method specifically comprises the following steps:

wafers of different sizes 6 inches, 8 inches and 12 inches;

wafers of different process nodes are 28nm, 14nm and 7nm;

Wafers at different process stages, namely after photoetching, etching and polishing;

different defect types, scratch, crack, flaking and foreign object attachment;

different defect levels, ranging from mild to severe.

A total of 100,000 raw wafer images were acquired, containing various defect types and defect-free samples. Through data analysis, the distribution of various defects in a data set is basically balanced, and the defect-free sample accounts for about 60 percent so as to simulate the occurrence frequency of the defects in actual production. The acquired raw image data is denoted as dataset.

(2) Data preprocessing

ROI region clipping and its effect on model accuracy considering the circular nature of the wafer, a circular ROI (region of interest) clipping algorithm is designed:

Firstly detecting the edge of a wafer by using Hough circle transformation to obtain the center coordinate and the radius of the wafer, secondly taking the detected center as the center of a circle, taking a circular region with the radius slightly smaller than the detection radius (98 percent) as the ROI, and finally setting the pixel value of the region outside the ROI as 0, and reserving the original image information in the ROI. The cutting method can effectively remove the interference information of the wafer edge and improve the attention of the model to the defects of the central area.

Image noise reduction processing in wafer images, gaussian white noise and speckle noise are generally present due to thermal noise of sensor readout circuits in ace-series industrial cameras and the influence of high-frequency components of the images. In order to reduce the influence of these noises on the image quality and the subsequent processing, image denoising processing is required.

First, gaussian white noise is removed by gaussian filtering. Gaussian filtering by convolution with the gaussian kernel can effectively smooth the image and remove noise. Specifically, the calculation formula of the gaussian filter is:

Where σ is the standard deviation of the gaussian kernel, x and y represent the pixel positions in the image, respectively, to represent the offset of each pixel point to the center of the filter. This offset is used to calculate the value of the gaussian distribution function to determine the weight of the pixel. Gaussian filtering can effectively reduce gaussian white noise in an image by convolving the image.

Data enhancement technology in order to increase the diversity of data and generalization capability of a model, the following data enhancement technology is adopted:

Random rotation, namely randomly rotating the image within the range of [ -10 degrees, 10 degrees ];

Random scaling, namely randomly scaling the image in the range of [0.9,1.1 ];

random translation, namely randomly translating within the range of +/-5% of the image width and height;

random brightness and contrast adjustment, wherein the brightness adjustment range is [0.8,1.2] and the contrast adjustment range is [0.8,1.2];

adding Gaussian noise with the mean value of 0 and standard deviation within the range of [0,0.05] into the random Gaussian noise;

random horizontal and vertical overturn, namely overturning at 50% probability;

These data enhancement techniques are implemented through Albumentations libraries and applied in real-time during the training process. Each original image would generate 5 enhanced images, expanding the dataset to 500,000. This enhancement strategy significantly improves the model's ability to adapt to various lighting conditions and defect morphologies.

The image data after the data preprocessing is recorded as data.

(3) Label arbitration

The manual labeling of the collected raw image data is to provide accurate training data for subsequent defect detection tasks. Industrial-level labeling software LabelImg is employed that has an intuitive user interface and rich functionality. The user can draw a bounding box on the image to annotate the target object through simple operation, and corresponding label information is added for each annotation box. The labeling procedure follows the following rules:

strict pixel level labeling, namely accurately drawing a boundary box of a defect area on an image by labeling personnel so as to ensure the accuracy and precision of labeling;

The defect classification, regression and circular boundaries are used as labels, wherein the classification labels are labeled label _cls and correspond to five defect types (no defect, scratch, crack, flaking and foreign object adhesion), the value of each dimension of the classification labels represents the confidence score of the defect of the type, the regression labels are labeled label _reg and contain the boundary frame coordinate information of a prediction target and are used for accurately positioning the position of the defect, and the circular boundary labels are labeled label _cir and give circular parameters closely related to the defect boundary, so that the accuracy of boundary positioning can be further improved.

Each image may contain 0 or more defect labels, and labeling personnel need to label all existing defect areas in the image according to actual conditions;

and (3) storing the labeling result, namely storing the labeling result according to a PASCAL VOC format, wherein the format can be conveniently integrated and processed with the wafer defect detection model.

The labeling process requires labeling personnel to have certain expertise and experience so as to ensure the accuracy and usability of labeling results.

(4) Data set generation

In order to divide the marked data into a training set, a verification set and a test set, and prepare for training, evaluation and testing of the model, a data set needs to be constructed, namely the marked data set is marked as (data), and the training set, the verification set and the test set are divided according to the proportion of 8:1:1.

In order to divide the annotated data into a training set, a validation set and a test set, so as to prepare for training, evaluation and testing of the model, the data set (data, label) needs to be divided according to the proportion of 8:1:1, and the specific steps are as follows:

The labeled dataset (data, label) is divided into 80% of the dataset used as a training Set (TRAINING SET), 10% of the dataset used as a Validation Set (Validation Set), and 10% of the dataset used as a Test Set (Test Set).

Such partitioning can ensure that there is sufficient data for parameter adjustment and optimization during model training, and evaluation of the validation set and final performance validation of the test set are performed after model training to ensure generalization and validity of the model.

The visual recognition method as described above, the image dynamic multi-scale feature extraction and feature enhancement network construction process in S2 includes:

The scheme creatively provides a wafer image dynamic multi-scale feature enhancement network, and breaks through the limitation of traditional single-scale feature extraction. Through multi-scale feature extraction and enhancement and combination of a dynamic weight distribution mechanism, the network can capture local details and global semantics at the same time, and the richness and the robustness of feature representation are remarkably improved. Particularly, the module for dynamically adjusting the feature weight effectively highlights key information, suppresses irrelevant noise, and enhances the adaptability of the model in complex production line environments. In addition, the network design pays attention to the calculation efficiency, and the method is suitable for semiconductor wafer production line equipment by taking the performance and the instantaneity into consideration through the characteristic reuse and the light-weight structure. The method not only improves the accuracy of defect detection, but also enhances the robustness of the system under various visual interference conditions.

Processing an input wafer image from a bottom layer to a top layer, firstly extracting feature graphs { C2, C3, C4, C5} with different scales, and then constructing a multi-scale feature enhancement network according to the following steps:

Starting from the deepest P5 layer, an upper wafer feature map is generated by upsampling, and this high-level semantic information is transferred to the top layer of the multi-scale feature enhancement network:

M5=C5

wherein M5 is the top-most feature map in the multi-scale feature enhancement network, from the deepest level C5 feature map, and then performing 1×1 convolutional channel number adjustment on M5:

P5=Conv1×1(M5)

then constructing a multi-scale feature enhancement network from top to bottom:

M4=C4+Upsample(P5)

P4=Conv3×3(M4)

Conv3×3 is a3×3 convolution, and is used for fusing features of different scales, upsample is an up-sampling operation implemented by deconvolution, the steps are repeated to gradually generate { P3, P4, P5}, and finally P3, P4, P5 is a multi-scale feature enhancement network fused with multi-scale information;

The dynamic characteristic weight distribution module is integrated in the network to enhance the adaptability to the multi-scale characteristics and learn the correlation among the characteristics at different positions in the wafer characteristic diagram. By highlighting important features and suppressing irrelevant features, the mechanism first generates a spatial attention sub-module, i.e. an attention weight for each location, and then combines the spatial attention sub-module with the original feature map to obtain an enhanced feature representation. The method comprises the following specific steps:

The input of the feature weight dynamic allocation module is a wafer feature map X, the dimension is (C X H X W), two 1X 1 convolution checks X are used for channel compression, compressed wafer feature maps A and B are generated, and the dimension of the compressed wafer feature map is (C' X H X W):

A=X·W_a(C′×H×W)

B=X·W_b(C′×H×W)

Wherein, W _a and W _b are the weights of two 1×1 convolution kernels, respectively, and the similarity between the positions is calculated by using a and B to generate a spatial attention submodule map M, and the dimension is (h×w×h×w):

M=Softmax(A^T·B)

Then, element level multiplication is carried out on the original input X and M, and a wafer characteristic diagram X' after weighting of the space attention submodule is obtained:

X′=X·M

and finally, adding X' and X, and outputting through a convolution layer:

O=γ·X′+X·W_c

Wherein, gamma is a leachable scaling factor for scaling the enhancement effect of the spatial attention sub-module, W _c is a1×1 convolution kernel weight, and O is the final output wafer feature map of the feature weight dynamic allocation module.

The visual identification method as described above, the process of constructing the wafer defect detection model by the multitasking cooperation in S3 includes:

In the defect detection task, classification and regression are two interrelated but often separately processed sub-tasks. The traditional method respectively models and optimizes two tasks, and lacks knowledge fusion and mutual promotion between the two tasks. To solve this problem, the present invention devised a multitasking collaborative detection algorithm (Multi-task Collaborative Detection Algorithm, MCDA).

The key point of the MCDA is that a multi-task cooperative coding mechanism is introduced to mutually fuse the classification information and the regression information. The model cascades the feature graphs of the two subtasks through the convolution layer and mutually codes the information of the other party so as to obtain the classification and regression features fused with the two-way relation. In addition, the MCDA introduces a circular boundary prediction branch, and circular parameters related to the defect boundary can be predicted more finely through cascading classification and regression prediction results, so that the accuracy of boundary positioning is improved. Compared with the traditional separate modeling mode, the MCDA can better mine the inherent relation among classification, regression and shape, so that the consistency and accuracy of overall detection are improved. The method specifically comprises the following steps:

And (3) sending the dynamic multiscale feature extraction and the feature O output by the feature enhancement network into the MCDA, and simultaneously completing classification, regression and circular boundary prediction:

cls_pred,reg_pred,cir_pred=MCDA(O)

Cls _pred,reg_pred,cir_pred is a classification prediction result, a regression prediction result, and a circular boundary prediction result of wafer defect detection, respectively.

The specific calculation process of the MCDA is as follows:

1) Performing basic convolution on O to obtain an initial predicted value { cls _init,reg_init }:

cls_init,reg_init=Conv(O)

2) Performing relation coding on reg _init and cls _init to obtain regression characteristics reg _fused fusing the relation between the reg _init and cls _init:

reg_fused＝reg_init+Conv(Concat(reg_init,cls_init))

3) Meanwhile, cls _init and reg _init are subjected to relational coding to obtain classification characteristics cls _fused fusing the relation between the cls _init and reg _init:

cls_fused＝cls_init+Conv(Concat(cls_init,reg_init))

4) Convolution was performed on reg _fused and cls _fused, respectively, to yield the final classification prediction cls _pred and regression prediction reg _pred:

5) Cascading cls _pred and reg _pred, through an additional convolution branch, predicts the defect boundary circle parameter cir _pred closely related to the defect boundary:

cir_pred＝Conv(Concat(cls_pred,reg_pred))

Through the multi-task collaborative coding, classification and regression knowledge can be fully fused, so that two tasks are mutually promoted. While introducing circular boundary prediction branches to more finely describe the shape and location of wafer defects.

In the existing target detection method, common loss functions such as FocalLoss, CIoULoss and the like only focus on optimizing indexes of a single task, and lack of consideration on interaction among multiple tasks and knowledge guidance. To solve this problem, the present invention proposes a multitasking synergy loss function.

The innovation of the loss function is to integrate three parts of classification, regression and circular boundary so as to realize joint optimization of three closely related subtasks of classification, regression and shape. Compared with the traditional single loss function, the multi-task cooperative loss function fully utilizes the correlation among the subtasks, realizes the mutual promotion among the tasks, and simultaneously integrates the guidance of priori knowledge, so that the model benefits from additional knowledge transfer in the optimization process, and the overall performance and the robustness of detection are improved.

Therefore, the invention adopts a multi-task cooperative loss function, and simultaneously optimizes three tasks of classification, regression and circular boundary prediction, namely:

L=λ₁L_cls+λ₂L_reg+λ₃L_cir

Wherein L _cls is a classification loss, focalLoss is used, L _reg is a regression loss, CIoULoss is used, L _cir is a circular boundary loss, polygonLoss is used, and lambda ₁,λ₂,λ₃ is a super parameter for balancing each loss term.

Under the optimization of the multi-task cooperative loss function, the target detection network can output three prediction results, namely a classification prediction result cls _pred, a regression prediction result reg _pred and a circular boundary prediction result cir _pred.

The classification predictor cls _pred is a vector, and corresponds to four defect types (scratch, crack, flaking, foreign object adhesion), and the value of each dimension represents the confidence score of the defect type. The defect type in the current prediction frame can be determined by acquiring the dimension index with the highest score;

The regression prediction result reg _pred contains the boundary frame coordinate information of the prediction target and is used for accurately positioning the position of the defect;

The circular boundary prediction result cir _pred gives out a circular parameter closely related to the defect boundary, and the accuracy of boundary positioning is further improved.

After the three prediction results are obtained, the defect type can be judged, and accurate position and shape information is combined, so that accurate detection of various defects of the wafer is realized, and an important basis is provided for subsequent quality control and defect repair.

The visual recognition method as described above, wherein the step S4 includes:

The labeled training data set is input into the model, and the model is trained by the scheme by providing an innovative antagonistic collaborative learning strategy. The strategy is innovative multi-stage progressive training, the local detail and global semantic information in wafer image data and the dynamic relation of the local detail and the global semantic information in the fusion process are fully considered, and the performance of the model is improved through multi-scale sequential consistency and depth collaborative optimization.

(1) Multiscale timing consistency loss:

To enhance the model's learning of different time batches of data information, the present proposal proposes a multi-scale timing consistency penalty. The loss function measures the predicted consistency across different time scales:

L_tc＝∑_sλ_s*D_KL(P(Y|X_t),P(Y|X{t-s}))

Where s represents different time scales, λ _s is the corresponding weight, D _KL is the KL divergence, and P (y|x _t) represents the detection distribution at time step t, including classification detection, regression detection, and circular boundary detection, as summarized for one total detection result. This loss encourages the model to maintain predictive consistency across different time scales, thereby improving the modeling ability of different time batch data.

(2) Resistance training strategy:

to increase the robustness of the training, the present proposal introduces an antagonistic training strategy. In particular, a discriminator D is designed in an attempt to distinguish whether a feature is from local detail or global semantic information. The converged network F is then trained to spoof the arbiter:

L_adv＝E[log(D(F(X_v,X_a)))]+E[log(1-D(X_v))+log(1-D(X_a))]

Wherein X _v and X _a are local detail and global semantic information inputs, respectively. Such countermeasure training forces the network to generate more indistinguishable, more tightly fused representations of features.

(3) Course learning and difficulty adaptation

According to the (2) resistance strategy, a course learning strategy based on sample difficulty is provided. The sample difficulty D (x) is defined as:

D(x)=1-exp(-γ(w_vL_v+w_aL_a))

where L _v and L _a are loss of local detail and global semantic information, respectively, and γ is an adjustable parameter. In the training process, the proportion of difficult samples is gradually increased:

Wherein T is the current number of training steps, T is the total number of training steps, p ₀ is the initial difficult sample proportion, and mu controls the rate of increase of difficulty.

(4) Dynamic batch normalization

The present proposal proposes a dynamic batch normalization (Dynamic Batch Normalization, DBN) technique to account for variability in illumination and noise conditions in semiconductor wafer production lines. The DBN dynamically adjusts normalization parameters according to the input statistical characteristics:

y=γ(x)(x-μ(x))/(σ(x)+ε)+β(x)

where γ (x) and β (x) are input dependent scaling and offset parameters. The method can be better suitable for the characteristic distribution change under different input conditions.

(5) Model training

Based on the method, the specific flow of model training is as follows:

1) Initializing a model parameter theta, wherein theta represents a parameter to be updated and comprises all weight matrixes and bias vectors in the wafer defect detection model;

2) For each training step t:

a. sampling a batch from the dataset, containing difficult samples in a proportion of p _hard (t);

b. forward propagation is carried out to obtain multi-scale characteristics and prediction results;

c. Calculating a multi-scale time sequence consistency loss L _tc;

d. performing an antagonism training strategy, and updating the discriminator D and the fusion network F;

e. applying dynamic batch normalization;

f. calculating the total loss L _total＝λ_tc*L_tc+λ_adv*L_adv;

g. back propagation, updating model parameters: Wherein the method comprises the steps of Is the learning rate.

3) And (2) repeating the step (2), and terminating the training process when the loss cannot be reduced by N continuous epochs. And storing final model parameters at the end of training to serve as a deployment model for wafer defect detection.

The visual inspection method as described above, the wafer defect inspection model deployment process in S5 includes:

(1) Hardware platform selection and configuration:

a. Selecting proper edge computing equipment, such as NVIDIAJetson series or Intel NUC high-performance embedded systems, according to the actual requirements of a semiconductor wafer production line;

b. Configuring necessary deep learning frames and dependency libraries, such as CUDA, CUDNN and the like, on a selected hardware platform to support efficient operation of the model;

c. Optimizing hardware resource allocation, and reasonably setting GPU memory use limit and CPU thread number to balance detection performance and system stability.

(2) Model integration and interface development:

a. Designing and realizing a model reasoning interface, which comprises an image preprocessing function module, a model reasoning function module and a result post-processing function module;

b. developing a communication interface with a production line control system to realize real-time transmission and feedback of detection results;

c. and a caching mechanism is constructed, so that data stream processing is optimized, and the influence of I/O operation on the detection speed is reduced.

(3) Real-time image acquisition and pretreatment:

a. the wafer image is acquired in real time through a high-speed industrial camera, so that the image quality and the acquisition frequency are ensured to meet the detection requirement;

b. Realizing an image preprocessing pipeline, including center clipping, denoising and data enhancement operations, so as to improve the quality of an input image;

c. and an asynchronous processing mechanism is adopted, and preprocessing is performed while the image is acquired, so that the computing resource is utilized to the maximum extent.

(4) Defect detection and result output:

a. Inputting the preprocessed image into a deployed wafer defect detection model, and executing reasoning operation;

b. outputting an analysis model, and extracting defect types, boundary frame coordinate information and circular parameter information;

c. and screening and grading the detection result according to a preset threshold value to reduce false alarm and missing report.

(5) And (3) visualizing and storing detection results:

a. developing a real-time visual interface, and intuitively displaying detection results, wherein the detection results comprise defect types, boundary frame coordinate information and circular parameter information;

b. the local storage function of the detection result is realized, and the local storage function comprises an original image, the detection result and related data;

(6) Linkage and feedback of the production line:

a. Transmitting the detection result to a production line control system in real time for automatic decision making;

b. Developing an alarm mechanism, and timely notifying related personnel and triggering emergency response of a production line when a serious defect is detected;

c. And the correlation analysis of the detection result and the production parameter is realized, and data support is provided for process optimization.

(7) Performance evaluation and continuous optimization:

a. Establishing a periodic performance evaluation mechanism, including statistical analysis of key indexes of detection accuracy, recall and processing speed;

b. developing an automatic test flow, and evaluating the performance of the model in an actual production environment by using a standard test set;

c. And continuously optimizing model parameters and deployment strategies according to the performance evaluation result and the production feedback, and continuously improving the overall performance of the detection system.

The invention also provides a visual recognition device for computer processing, which uses the visual recognition method.

Compared with the prior art, the invention has the following beneficial effects:

(1) Multitasking collaborative optimization:

The invention introduces a multi-task collaborative optimization strategy in wafer defect detection, and integrates classification, regression and circular boundary prediction tasks into an integral frame. Conventional detection methods often focus on a single task, such as performing only defect classification or region segmentation, resulting in limitations and deviations in the detection results. Through the multi-task collaborative optimization, the method and the system can utilize the relevance and complementarity among different tasks to improve the overall performance of the model. Specifically, the cooperative mechanism enables the model to improve the detection capability of tiny and edge defects while maintaining the accuracy when processing complex and diversified defects, and remarkably enhances the comprehensiveness and robustness of detection.

(2) Dynamic multi-scale feature extraction and enhancement:

The present invention innovatively proposes a dynamic multi-scale feature extraction and enhancement network architecture to address the different scale defects on semiconductor wafers. Conventional inspection methods typically employ fixed-scale feature extraction, which tends to lose critical information in the face of complex wafer defects, particularly when dealing with defects of varying sizes and morphologies. According to the invention, by introducing a dynamic multi-scale feature extraction network, the extraction scale can be dynamically adjusted according to the geometric characteristics and the position of the defect, so that the global structure and the local detail of the defect are fully captured in the multi-scale feature fusion process. Meanwhile, the feature enhancement mechanism further optimizes the expression capability of key features, so that the model can accurately identify fine defects under a complex background, and the accuracy and the robustness of detection are improved.

(3) Antagonistic collaborative learning training strategy:

In order to further improve the generalization capability of the model, the invention adopts a unique antagonistic collaborative learning training strategy. In the strategy, the model performs collaborative optimization on multiple tasks by simulating various complex conditions in an actual production environment in the training process. By introducing a contrast factor in the multi-stage progressive training, the model is able to effectively cope with different types of wafer defects, especially in the face of unknown or very challenging defects, yet still be able to maintain a high level of inspection accuracy. The strategy remarkably enhances the resistance of the model to various interference factors, so that the model shows stronger robustness and reliability in practical application.

Drawings

FIG. 1 is a diagram of a wafer image dynamic multi-scale feature enhancement network;

Fig. 2 is an overall block diagram of semiconductor wafer defect detection.

Detailed Description

FIG. 1 is a diagram of a wafer image dynamic multi-scale feature enhancement network incorporating feature weight dynamic allocation modules in the network to enhance adaptability to multi-scale features and learn correlations between features at different locations in the wafer feature map. By highlighting important features and suppressing irrelevant features, the mechanism first generates a spatial attention sub-module, i.e. an attention weight for each location, and then combines the spatial attention sub-module with the original feature map to obtain an enhanced feature representation.

Fig. 2 is a block diagram of a semiconductor wafer defect detection system, which includes the steps of firstly, collecting original semiconductor wafer image data based on a real semiconductor wafer generation line, preprocessing the data, including center cutting, denoising and data enhancement, ensuring the smoothness of the image data, finally obtaining a complete wafer defect detection data set, secondly, constructing a dynamic multi-scale feature extraction and feature enhancement network of a wafer defect image aiming at the data set, thereby generating a multi-stage feature map containing local details and global semantic information, then, designing a multi-stage feature map based on the multi-stage feature map, outputting a multi-stage collaborative wafer defect detection model, outputting classification, regression and circular boundary prediction results of wafer defects, realizing visual detection of the semiconductor wafer defects, and finally, performing progressive training by combining training strategies of antagonism collaborative learning, finally obtaining a high-efficiency wafer defect detection model, and disposing the high-efficiency wafer defect detection model into a visual module of semiconductor wafer production equipment, so as to realize real-time defect detection and feedback.

The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

While the foregoing describes the embodiments of the present invention, it should be understood that the present invention is not limited to the embodiments, and that various modifications and changes can be made by those skilled in the art without any inventive effort.

Claims

1. A computer-based visual recognition method, characterized in that the method comprises:

S1. Collect original semiconductor wafer image data based on a real semiconductor wafer production line, design a data preprocessing method of center cropping, denoising and data enhancement for the collected original image data, and perform label determination to obtain a complete wafer defect detection data set;

S2. For the wafer defect detection dataset obtained in S1, a dynamic multi-scale feature extraction and feature enhancement network for wafer defect images is constructed to obtain a multi-level feature map containing local details and global semantic information of the wafer defect image;

S3. Based on the multi-level feature map of wafer defects obtained in S2, a multi-task collaborative wafer defect detection model is designed to obtain the classification, regression and circular boundary prediction results of wafer defects, thereby obtaining a semiconductor wafer defect detection model;

S4, based on the semiconductor wafer defect detection model obtained in S3, construct a training strategy combined with adversarial collaborative learning, and perform multi-stage progressive training to obtain the final wafer defect detection model;

S5, model deployment, deploying the wafer defect detection model obtained in S4 to the vision module of the process equipment for semiconductor wafer production, to detect and provide feedback in real time whether there are defects in the produced semiconductor wafers.

2. The visual recognition method according to claim 1, characterized in that the process of collecting and making the data set in S1 comprises:

(1) Raw data collection:

A high-precision industrial camera is installed above the wafer transport track and works with the light source system to ensure the capture of clear wafer surface images; the original image data is saved in 16-bit TIFF format, and each pixel contains 65536 gray levels to ensure the high dynamic range and detail retention of the image; wafer samples from different production batches and different process stages are collected; specifically, they include: different sizes, process nodes, process stages, defect types, and defect degrees; the total collected original image data is recorded as dataset;

(2) Data preprocessing:

ROI area cropping and its impact on model accuracy: Considering the circular characteristics of the wafer, a circular ROI (region of interest) cropping algorithm is designed:

First, the wafer edge is detected using Hough circle transform to obtain the wafer center coordinates and radius; secondly, a circular area with the detected center as the center and a radius slightly smaller than the detection radius (98%) is used as the ROI; finally, the pixel values of the area outside the ROI are set to 0, and the original image information within the ROI is retained; this cropping method can effectively remove the interference information of the wafer edge and increase the model's attention to defects in the central area;

Image denoising, first, Gaussian filtering is used to remove Gaussian white noise; Gaussian filtering can effectively smooth the image and remove noise by convolving with the Gaussian kernel; specifically, the calculation formula of the Gaussian filter is:

Where σ is the standard deviation of the Gaussian kernel, x and y represent the pixel positions in the image, which are used to represent the offset of each pixel to the center of the filter; this offset is used to calculate the value of the Gaussian distribution function to determine the weight of the pixel; by performing convolution operations with the image, Gaussian filtering can effectively reduce the Gaussian white noise in the image;

Data augmentation techniques are implemented through the Albumentations library and applied in real time during training:

Random rotation: randomly rotate the image in the range of [-10°, 10°]; Random scaling: randomly scale the image in the range of [0.9, 1.1]; Random translation: randomly translate the image within the range of ±5% of the image width and height; Random brightness and contrast adjustment: the brightness adjustment range is [0.8, 1.2], and the contrast adjustment range is [0.8, 1.2]; Random Gaussian noise: add Gaussian noise with a mean of 0 and a standard deviation in the range of [0, 0.05]; Random horizontal and vertical flip: flip with a probability of 50% each;

Each original image will generate 5 enhanced images, thus expanding the data set to 500,000 images; the image data after data preprocessing is recorded as data;

(3) Labeling ruling:

The industrial-grade labeling software LabelImg is used, and the labeling process follows the following rules:

Strict pixel-level annotation: Annotators need to accurately draw the bounding box of the defect area on the image to ensure the accuracy and precision of the annotation;

Defect classification, regression and circular boundaries as labels: The classification label is recorded as label _cls , which corresponds to five defect types (no defect, scratch, crack, peeling and foreign matter attachment). The value of each dimension represents the confidence score of the defect type; the regression label is recorded as label _reg, which contains the bounding box coordinate information of the predicted target, which is used to accurately locate the position of the defect; the circular boundary label is recorded as label _cir , which gives the circular parameters closely related to the defect boundary, which can further improve the accuracy of boundary positioning;

Multiple defect annotations: Each image may contain 0 or more defect annotations. The annotator needs to mark all defect areas in the image according to the actual situation.

Annotation result saving: Annotation results are saved in PASCAL VOC format;

The annotation process requires the annotator to have certain professional knowledge and experience to ensure the accuracy and usability of the annotation results;

(4) Dataset preparation:

The labeled data set is recorded as (data, label), and the training set, validation set, and test set are divided into 8:1:1 ratios, specifically:

80% of the data is used as a training set, 10% of the data is used as a validation set, and 10% of the data is used as a test set.

3. The visual recognition method according to claim 1, characterized in that the image dynamic multi-scale feature extraction and feature enhancement network construction process in S2 comprises:

The input wafer image is processed from the bottom layer to the top layer. First, feature maps {C2, C3, C4, C5} of different scales are extracted, and then a multi-scale feature enhancement network is constructed according to the following steps:

Starting from the deepest P5 layer, the upper-level wafer feature map is generated by upsampling, and this high-level semantic information is passed to the top layer of the multi-scale feature enhancement network:

M5＝C5

Among them, M5 is the top-level feature map in the multi-scale feature enhancement network, which comes from the deepest C5 feature map, and then the number of 1×1 convolution channels of M5 is adjusted:

P5＝Conv1×1(M5)

Then build a multi-scale feature enhancement network from top to bottom:

M4＝C4+Upsample(P5)

P4＝Conv3×3(M4)

Among them, Conv3×3 is a 3×3 convolution, which is used to fuse features of different scales. Upsample is an upsampling operation implemented by deconvolution. Repeat the above steps to gradually generate {P3, P4, P5}. Finally, P3, P4, P5 are multi-scale feature enhancement networks that fuse multi-scale information.

The network incorporates a feature weight dynamic allocation module and learns the correlation between features at different positions in the wafer feature map. By highlighting important features and suppressing irrelevant features, the mechanism first generates a spatial attention submodule, that is, the attention weight corresponding to each position, and then combines the spatial attention submodule with the original feature map to obtain an enhanced feature representation. The specific steps are as follows:

The input of the feature weight dynamic allocation module is a wafer feature map X with a dimension of (C×H×W). Two 1×1 convolution kernels are used to perform channel compression on X to generate compressed wafer feature maps A and B. The dimensions of the compressed wafer feature map are (C′×H×W):

A＝X·W _a (C′×H×W)

B＝X·W _b (C′×H×W)

Among them, _Wa and _Wb are the weights of two 1×1 convolution kernels, respectively. A and B are used to calculate the similarity between positions to generate the spatial attention submodule map M with a dimension of (H×W×H×W):

M = Softmax( ^AT ·B)

Then, the original input X and M are element-wise multiplied to obtain the wafer feature map X′ weighted by the spatial attention submodule:

X′＝X·M

Finally, X′ and X are added and then output through a convolutional layer:

O＝γ·X′+X·W _c

Among them, γ is a learnable scaling factor used to scale the enhanced effect of the spatial attention submodule, _Wc is a 1×1 convolution kernel weight, and O is the final output wafer feature map of the feature weight dynamic allocation module.

4. The visual recognition method according to claim 1, characterized in that the process of constructing the multi-task collaborative wafer defect detection model in S3 comprises:

The model cascades the feature maps of the two subtasks through the convolution layer and encodes each other's information to obtain classification and regression features that integrate bidirectional relationships; the multi-task collaborative detection algorithm (MCDA) is used to introduce a circular boundary prediction branch, and the classification and regression prediction results are cascaded. The specific process includes the following:

The feature O output by the dynamic multi-scale feature extraction and feature enhancement network is fed into MCDA to complete classification, regression and circular boundary prediction at the same time:

cls _pred ,reg _pred ,cir _pred =MCDA(O)

Among them, cls _pred , reg _pred , cir _pred are the classification prediction results, regression prediction results, and circular boundary prediction results of wafer defect detection respectively;

The specific calculation process of MCDA is as follows:

1) Perform basic convolution on 0 to obtain the initial prediction values for classification and regression {cls _init , reg _init }:

cls _init ,reg _init =Conv(O)

2) Encode the relationship between reg _init and cls _init to obtain the regression feature reg _fused that integrates the relationship between the two:

reg _fused =reg _init +Conv(Concat(reg _init , cls _init ))

3) At the same time, the relationship between cls _init and reg _init is encoded to obtain the classification feature cls _fused that integrates the relationship between the two:

cls _fused =cls _init +Conv(Concat(cls _init ,reg _init ))

4) Convolve reg _fused and cls _fused respectively to obtain the final classification prediction cls _pred and regression prediction reg _pred :

5) Cascade cls _pred and reg _pred , and pass through an additional convolution branch to predict the defect boundary circular parameter cir _pred which is closely related to the defect boundary:

cir _pred =Conv(Concat(cls _pred ,reg _pred ))

A multi-task collaborative loss function is used to simultaneously optimize the three tasks of classification, regression, and circular boundary prediction, namely:

L＝λ ₁ L _cls +λ ₂ L _reg +λ ₃ L _cir

Among them, L _cls is the classification loss, using FocalLoss; L _reg is the regression loss, using CIoULoss; L _cir is the ring boundary loss, using PolygonLoss; λ ₁ , λ ₂ , λ ₃ are hyperparameters for balancing each loss term;

Under the optimization of multi-task collaborative loss function, the object detection network can simultaneously output three prediction results: classification prediction result clspred, regression prediction result regpred and circular boundary prediction result cirpred;

Among them, the classification prediction result clspred is a vector, corresponding to four defect types (scratches, cracks, peeling, and foreign matter attachment), and the value of each dimension represents the confidence score of the defect type;

By obtaining the dimension index with the highest score, the defect type in the current prediction box can be determined; the regression prediction result regpred contains the bounding box coordinate information of the predicted target; the circular boundary prediction result cirpred gives the circular parameters that are closely related to the defect boundary;

After obtaining the above three prediction results, the defect type can be determined, and combined with precise position and shape information, accurate detection of various wafer defects can be achieved, providing an important basis for subsequent quality control and defect repair.

5. The visual recognition method according to claim 1, characterized in that the adversarial collaborative learning training strategy process in S4 comprises:

(1) Calculate multi-scale temporal consistency loss:

This paper proposes a multi-scale temporal consistency loss, which measures the consistency of predictions at different time scales:

L _tc ＝∑ _s λ _s *D _KL (P(Y|X _t ),P(Y|X{ts}))

Among them, s represents different time scales, λ _s is the corresponding weight, D _KL is the KL divergence, and P(Y|X _t ) represents the detection distribution at time step t, including classification detection, regression detection and circular boundary detection, which is a summary of the total detection results; this loss encourages the model to maintain consistency in predictions at different time scales, thereby improving the modeling ability of different time batch data;

(2) Adversarial training strategy:

An adversarial training strategy is introduced, specifically: a discriminator D is designed to try to distinguish whether the features come from local details or global semantic information; the fusion network F is trained to deceive the discriminator:

L _adv =E[log(D(F(X _v ,X _a )))]+E[log(1-D(X _v ))+log(1-D(X _a ))]

Among them, _Xv and _Xa are local details and global semantic information input respectively;

(3) Course learning and difficulty adaptation:

According to (2) adversarial strategy, a curriculum learning strategy based on sample difficulty is proposed; the sample difficulty D(x) is defined as:

D(x)＝1-exp(-γ(w _v L _v +w _a L _a ))

Among them, _Lv and _La are the losses of local details and global semantic information respectively, and γ is an adjustable parameter. During the training process, the proportion of difficult samples is gradually increased:

Where t is the current training step, T is the total training step, _p0 is the initial difficult sample ratio, and μ controls the rate at which difficulty increases;

(4) Dynamic Batch Normalization:

Using Dynamic Batch Normalization (DBN) technology, DBN dynamically adjusts the normalization parameters according to the statistical characteristics of the input:

y=γ(x)(x-μ(x))/(σ(x)+ε)+β(x)

Where γ(x) and β(x) are input-dependent scaling and offset parameters;

(5) Model training:

Based on the above method, the specific process of model training is as follows:

1) Initializing the model parameters θ, where θ represents the parameters to be updated, including all weight matrices and bias vectors in the wafer defect detection model;

2) For each training step t:

a. Sample a batch from the dataset, containing _hard samples in proportion to phard(t);

b. Perform forward propagation to obtain multi-scale features and prediction results;

c. Calculate the multi-scale temporal consistency loss L _tc ;

d. Perform adversarial training strategy to update the discriminator D and fusion network F;

e. Apply dynamic batch normalization;

f. Calculate the total loss: L _total = λ _tc *L _tc + λ _adv *L _adv ;

g. Back propagation, update model parameters: in is the learning rate;

3) Repeat step 2) and terminate the training process when the loss cannot be reduced after N consecutive epochs; save the final model parameters at the end of the training as the deployment model for wafer defect detection.

6. The visual recognition method according to claim 1, characterized in that the wafer defect detection model deployment process in S5 comprises:

(1) Hardware platform selection and configuration:

a. Select suitable edge computing devices, such as NVIDIA Jetson series or Intel NUC high-performance embedded systems, based on the actual needs of the semiconductor wafer production line;

b. Configure the necessary deep learning frameworks and dependent libraries on the selected hardware platform: CUDA, cuDNN, etc.;

c. Optimize hardware resource allocation and reasonably set GPU memory usage limits and CPU thread numbers;

(2) Model integration and interface development:

a. Design and implement the model reasoning interface, including image preprocessing, model reasoning and result post-processing functional modules;

b. Develop a communication interface with the production line control system to achieve real-time transmission and feedback of test results;

c. Build a cache mechanism, optimize data stream processing, and reduce the impact of I/O operations on detection speed;

(3) Real-time image acquisition and preprocessing:

a. Real-time acquisition of wafer images through high-speed industrial cameras;

b. Implement the image preprocessing pipeline, including center cropping, denoising, and data augmentation operations;

c. Use asynchronous processing mechanism to perform preprocessing while image acquisition;

(4) Defect detection and result output:

a. Input the preprocessed image into the deployed wafer defect detection model to perform inference operations;

b. Parse the model output to extract defect category, bounding box coordinate information, and circle parameter information;

c. Screen and classify the test results according to the preset threshold;

(5) Visualization and storage of test results:

a. Develop a real-time visualization interface to intuitively display the inspection results, including defect categories, bounding box coordinate information, and circle parameter information;

b. Realize the local storage function of the test results, including original images, test results and related data;

(6) Production line linkage and feedback:

a. Transmit the test results to the production line control system in real time for automated decision-making;

b. Develop an alarm mechanism to promptly notify relevant personnel and trigger an emergency response on the production line when a serious defect is detected;

c. Realize the correlation analysis between test results and production parameters to provide data support for process optimization;

(7) Performance evaluation and continuous optimization:

a. Establish a regular performance evaluation mechanism, including statistical analysis of key indicators of detection accuracy, recall rate and processing speed;

b. Develop an automated testing process and use standard test sets to evaluate the performance of the model in the actual production environment;

c. Based on performance evaluation results and production feedback, continuously optimize model parameters and deployment strategies to continuously improve the overall performance of the detection system.

7. A visual recognition device based on computer processing, characterized in that the visual recognition device uses any one of the visual detection methods of claims 1 to 6.