CN118396947A

CN118396947A - Steel micro defect detection method suitable for edge equipment deployment

Info

Publication number: CN118396947A
Application number: CN202410484467.1A
Authority: CN
Inventors: 李毅仁; 明勇杰; 王凯军; 魏晓飞; 何勇军; 彭晶; 来博文; 苏敬勇
Original assignee: Harbin Institute Of Technology shenzhen Shenzhen Institute Of Science And Technology Innovation Harbin Institute Of Technology; Hegang Digital Technology Co ltd
Current assignee: Harbin Institute Of Technology shenzhen Shenzhen Institute Of Science And Technology Innovation Harbin Institute Of Technology; Hegang Digital Technology Co ltd
Priority date: 2024-04-22
Filing date: 2024-04-22
Publication date: 2024-07-26

Abstract

The invention discloses a steel micro defect detection method suitable for edge equipment deployment, which relates to the field of steel micro defect identification of industrial environments, and is characterized in that a steel defect image is compressed and processed into an image file with a specified size, the image is input into a characteristic extraction network based on a U-shaped structure, a deformable convolution and attention module of the characteristic extraction network is responsible for extracting characteristics, a point product mode of an attention mechanism acquires context information characteristics in the steel defect, meanwhile, in order to keep certain initial characteristics of the steel defect, fine granularity fusion is carried out on the extracted local significant characteristics and global context information characteristics to obtain final output characteristics, the final output characteristics are transmitted to a decoder part, the characteristic mapping is carried out by the decoder, and the steel micro defect detection result is finally output, so that the characteristic extraction network is conveniently deployed at an edge equipment end, a knowledge distillation learning frame is provided, and an objective function is continuously optimized through learning of a teacher network, so that the guidance of a student network is achieved.

Description

A method for detecting tiny defects in steel suitable for edge device deployment

技术领域Technical Field

本发明涉及工业环境的钢材微小缺陷识别领域，尤其涉及一种适合边缘设备部署的钢材微小缺陷目标检测方法。The present invention relates to the field of steel micro-defect recognition in industrial environments, and in particular to a steel micro-defect target detection method suitable for edge device deployment.

背景技术Background technique

随着科学技术的迅猛发展使生产力发生了质的飞跃，世界钢铁产业将面临全新的机遇与挑战，钢材是国家建设和实现四化的重要物质基础，尤其是在汽车、建筑业、航空航天、电子机械等行业，对钢材产量的需求日趋增长，钢板表面缺陷的产生直接影响企业生产效益，钢材缺陷目标检测旨在通过检测算法找出生产钢材中所有感兴趣的缺陷目标，并确定它们的位置和类别；现有的钢材缺陷检测方法一般可以分为三类：基于人工目测的方法、基于涡流检测的方法以及基于机器视觉的方法；基于人工目测的方法是对钢材表面缺陷检测最原始的方式，该方法虽然缺陷识别精度高，但需要大量人力投入，靠人工目测的方法费时费力，且承担实时检测的能力略显不足；基于涡流检测的方法在钢材表面缺陷检测系统中使用频率较高，但该方法本身容易受到环境噪声和探头提离以及设备结构变化产生的影响，钢材缺陷位置反馈信号往往会被恶化，故在使用这种检测方法前，需要引入信号处理技术提升缺陷位置反馈信号的信噪比，这也增加了钢材表面缺陷检测的技术难度；在钢材表面缺陷检测方法中基于机器视觉的方法尤以基于深度学习的缺陷检测方法效果最佳，基于深度学习的方法通过卷积神经网络提取钢材表面缺陷的图像特征，并通过一系列操作，对钢材表面的缺陷可以实现精准的识别；但基于深度学习的方法设计出的神经网络模型受网络参数量多，计算量大的以及结构过于复杂等问题的限制，难以在工业环境中的边缘设备上进行实际部署应用；为了缓解上述问题，本专利提出了一种适合边缘设备部署的钢材微小缺陷检测方法(A steel micro-defect detection method suitable for edge devicedeployment)，它主要包括三个部分：基于可变形卷积与注意力的特征提取模块、基于U形结构的特征提取网络和基于知识蒸馏的学习架构；基于可变形卷积与注意力的特征提取模块运用可变形卷积在感受野内动态地调整权重，在特定位置上灵活捕获图像中的特征，同时使用注意力机制使模块更加关注与钢材缺陷有关的区域；基于U形结构的特征提取网络由编码器与解码器组成，编码器由多个可变形卷积与注意力的特征提取模块组成，其目的是提取输入图像中各种尺度的语义特征，同时保留空间信息和局部细节，解码器部分负责编码器提取的特征映射解码成最终的输出；基于知识蒸馏的学习架构对特征提取网络进行知识蒸馏得到更加轻量化的网络模型以便部署至边缘设备端。With the rapid development of science and technology, productivity has made a qualitative leap. The world's steel industry will face new opportunities and challenges. Steel is an important material basis for national construction and the realization of the four modernizations. Especially in the automotive, construction, aerospace, electronic machinery and other industries, the demand for steel production is growing. The generation of surface defects in steel plates directly affects the production efficiency of enterprises. Steel defect target detection aims to find all defect targets of interest in the production of steel through detection algorithms, and determine their locations and categories; Existing steel defect detection methods can generally be divided into three categories: methods based on manual visual inspection, methods based on eddy current detection, and methods based on machine vision; the method based on manual visual inspection is the most primitive way to detect steel surface defects. Although this method has high defect recognition accuracy, it requires a lot of manpower investment. The method based on manual visual inspection is time-consuming and labor-intensive, and its ability to undertake real-time detection is slightly insufficient; the method based on eddy current detection is used frequently in steel surface defect detection systems The method itself is easily affected by environmental noise, probe lift-off and changes in equipment structure. The feedback signal of the defect position of steel is often deteriorated. Therefore, before using this detection method, it is necessary to introduce signal processing technology to improve the signal-to-noise ratio of the defect position feedback signal, which also increases the technical difficulty of steel surface defect detection. Among the steel surface defect detection methods, the machine vision-based method, especially the deep learning-based defect detection method, has the best effect. The deep learning-based method extracts the image features of steel surface defects through convolutional neural networks, and through a series of operations, the defects on the steel surface can be accurately identified. However, the neural network model designed by the deep learning method is limited by the large number of network parameters, large amount of calculation and overly complex structure, and it is difficult to be actually deployed and applied on edge devices in industrial environments. In order to alleviate the above problems, this patent proposes a steel micro-defect detection method (A) suitable for edge device deployment. The steel micro-defect detection method suitable for edge device deployment) mainly includes three parts: a feature extraction module based on deformable convolution and attention, a feature extraction network based on U-shaped structure and a learning architecture based on knowledge distillation; the feature extraction module based on deformable convolution and attention uses deformable convolution to dynamically adjust the weights in the receptive field, flexibly captures the features in the image at a specific position, and uses the attention mechanism to make the module pay more attention to the area related to steel defects; the feature extraction network based on the U-shaped structure consists of an encoder and a decoder. The encoder consists of multiple feature extraction modules of deformable convolution and attention, and its purpose is to extract semantic features of various scales in the input image while retaining spatial information and local details. The decoder part is responsible for decoding the feature map extracted by the encoder into the final output; the learning architecture based on knowledge distillation performs knowledge distillation on the feature extraction network to obtain a more lightweight network model for deployment on the edge device.

发明内容Summary of the invention

本发明目的在于提供一种适合边缘设备部署的钢材微小缺陷检测方法，以解决上述问题。The purpose of the present invention is to provide a method for detecting tiny defects in steel suitable for edge device deployment to solve the above-mentioned problems.

本发明通过下述技术方案实现：The present invention is achieved through the following technical solutions:

一种适合边缘设备部署的钢材微小缺陷检测方法，其主要包括以下步骤：A method for detecting tiny defects in steel suitable for edge device deployment mainly includes the following steps:

S1：采用专业相机拍摄钢材的实际图像，由此获得需要的钢材缺陷原始图像数据集，通过对原始图像数据集进行压缩处理，将图像处理为规定尺寸大小，同时对钢材缺陷的位置进行标注。S1: Use a professional camera to capture the actual image of steel, thereby obtaining the required raw image dataset of steel defects. By compressing the raw image dataset, the image is processed into a specified size, and the location of the steel defects is marked.

S2：将处理后的钢材缺陷图像数据集输入到卷积组件中，用于提取钢材表面缺陷的浅层特征F，将浅层特征F输入由卷积块组成的偏移权重学习金字塔中，通过学习偏移权重矩阵W来调整可变形卷积的偏移量(ΔP_x,ΔP_y)；S2: Input the processed steel defect image dataset into the convolution component to extract the shallow features F of the steel surface defects, input the shallow features F into the offset weight learning pyramid composed of convolution blocks, and adjust the offset (ΔP _x ,ΔP _y ) of the deformable convolution by learning the offset weight matrix W;

(ΔP_x,ΔP_y)＝W(F)(ΔP _x ,ΔP _y )＝W(F)

其中(ΔP_x,ΔP_y)为可变形卷积的偏移，W为偏移权重学习金字塔的输出，F为输入的浅层特征。Where (ΔP _x ,ΔP _y ) is the offset of the deformable convolution, W is the output of the offset weight learning pyramid, and F is the shallow feature of the input.

S3：使用学习到的偏移量(ΔP_x,ΔP_y)来获取位于钢材缺陷不规则位置的新采样像素特征值l的位置p_l(x_l,y_l)；S3: Use the learned offset (ΔP _x , ΔP _y ) to obtain the position p _l (x _l , y _l ) of the new sampled pixel feature value l located at the irregular position of the steel defect;

p_l(x_l,y_l)＝p_g((x+Δp_x),(y+Δp_y))p _l (x _l ,y _l )=p _g ((x+Δp _x ),(y+Δp _y ))

其中l代表新采样的像素值，p_l(x_l,y_l)为像素值l的位置，p_g为前一采样像素值，p_g(x,y)为像素值p_g的位置。Where l represents the newly sampled pixel value, p _l (x _l ,y _l ) is the position of the pixel value l, p _g is the previous sampled pixel value, and p _g (x, y) is the position of the pixel value p _g .

S4：可变形卷积得到的特征F₁引入偏移量，而不考虑相邻像素之间的信息交互，为此，采用注意力机制进行相邻像素之间的信息交互，取特征F₁中的像素记为p_q(x,y)，取p_q与p_k∈K转置的点积来计算特定参考像素与所有采样像素的相关性；S4: The feature _F1 obtained by deformable convolution introduces an offset without considering the information interaction between adjacent pixels. To this end, the attention mechanism is used to interact with adjacent pixels. The pixel in the feature _F1 is recorded as _pq (x,y), and the dot product of _pq and _pk∈K is taken to calculate the correlation between a specific reference pixel and all sampled pixels.

其中Atten_qi用于衡量特定参考像素对第i个采样像素的相关性权重，K为采样偏移量的集合，k为采样像素的编号，p_k为任意像素的位置。Where Atten _qi is used to measure the relevance weight of a specific reference pixel to the i-th sampling pixel, K is the set of sampling offsets, k is the number of the sampling pixel, and p _k is the position of any pixel.

S5：为了有选择地聚合钢材缺陷中的上下文信息并在全局视图中保留更多语义信息，通过将所有采样像素p_a与相应的注意力权重相加来提取钢材缺陷中的上下文信息，同时为保留钢材缺陷某些初始的特征，将提取到的上下文信息与原始参考像素进行细粒度融合输出；S5: In order to selectively aggregate the contextual information in steel defects and retain more semantic information in the global view, the contextual information in steel defects is extracted by adding all sampled pixels p _a with the corresponding attention weights. At the same time, in order to retain some initial features of steel defects, the extracted contextual information is fine-grainedly fused and output with the original reference pixels;

其中p_F表示为可变形卷积与注意力的特征提取模块最终输出的特征，p_a为所有采样像素，F₁为钢材缺陷某些初始的特征。Where _pF represents the final output feature of the feature extraction module of deformable convolution and attention, p _a represents all sampled pixels, and _F1 represents some initial features of steel defects.

S6：将可变形卷积与注意力的特征提取模块最终输出的特征输入到解码器进行特征映射并得到最终钢材微小目标检测的结果。S6: The features finally output by the deformable convolution and attention feature extraction module are input into the decoder for feature mapping and the final result of steel micro-target detection is obtained.

S7：最后，为适合在边缘设备端进行部署，利用教师网络模型和学生网络模型组成的蒸馏学习架构进行知识蒸馏，通过教师网络不断训练优化损失函数L_g，达到对学生网络损失函数L_st的优化指导；S7: Finally, in order to be suitable for deployment on edge devices, the distillation learning architecture composed of the teacher network model and the student network model is used for knowledge distillation. The teacher network is continuously trained to optimize the loss function L _g , thereby achieving optimization guidance for the student network loss function L _st ;

L_st＝L_obj+L_cla+L_box _Lst ＝ _Lobj + _Lcla + _Lbox

其中K为教师网络生成的钢材缺陷检测框的长和宽，B为检测框的个数，M_q为在阈值范围内检测框的个数，s_ijc为学生网络模型生成的检测框位置的值，t_ijc为教师网络模型生成的检测框位置的值，L_obj表示真实框置信度与学生模型输出网络的置信度之间的损失，L_cla表示真实的分类概率与学生模型输出网络的类别之间的损失，L_box表示真实的检测钢材缺陷的位置与学生模型输出的钢材缺陷的位置之间的损失。Where K is the length and width of the steel defect detection box generated by the teacher network, B is the number of detection boxes, M _q is the number of detection boxes within the threshold range, s _ijc is the value of the detection box position generated by the student network model, t _ijc is the value of the detection box position generated by the teacher network model, L _obj represents the loss between the true box confidence and the confidence of the student model output network, L _cla represents the loss between the true classification probability and the category of the student model output network, and L _box represents the loss between the true position of the detected steel defect and the position of the steel defect output by the student model.

本发明与现有技术相比，具有如下的优点和有益效果：Compared with the prior art, the present invention has the following advantages and beneficial effects:

本专利是一种适合边缘设备部署的钢材微小缺陷检测方法，目前运用较多的钢材缺陷检测方法一般深度学习神经网络为主，利用神经网络对钢材缺陷特征进行提取检测，但现有的神经网络存在参数量多、计算量大等特点，无法在边缘设备上进行有效部署，同时神经网络对特征提取受卷积核感受野的限制，导致其对钢材微小目标缺陷关注不够；为了提高钢材微小缺陷目标检测准确度，同时使深度学习网络模型易于部署置边缘设备端；我们提出了一种适合边缘设备部署的钢材微小缺陷检测方法来有效提取钢材微小缺陷的局部显著特征和全局信息特征；该方法通过将钢材缺陷图像压缩处理成规定尺寸的图像文件，将该图像输入到基于U形结构的特征提取网络中，特征提取网络的可变形卷积与注意力模块负责对特征进行提取，可变形卷积对输入的钢材缺陷图像进行不规则卷积提取局部显著特征，注意力机制点积方式获取钢材缺陷中的上下文信息特征；同时为保留钢材缺陷某些初始的特征，将提取局部显著特征与全局上下文信息特征进行细粒度融合得到最终输出特征，将最终输出特征传递到解码器部分，由解码器进行特征映射并最终输出钢材微小缺陷检测的结果；最后，为便于特征提取网络在边缘设备端部署，提出一种知识蒸馏学习框架，通过教师网络的学习，不断优化目标函数，达到对学生网络的指导。This patent is a method for detecting tiny defects in steel suitable for edge device deployment. Currently, the most commonly used steel defect detection methods are generally based on deep learning neural networks, which use neural networks to extract and detect steel defect features. However, existing neural networks have the characteristics of large number of parameters and large amount of calculation, and cannot be effectively deployed on edge devices. At the same time, the feature extraction of neural networks is limited by the receptive field of the convolution kernel, resulting in insufficient attention to tiny target defects in steel. In order to improve the accuracy of target detection of tiny defects in steel and make deep learning network models easy to deploy on edge devices, we propose a method for detecting tiny defects in steel suitable for edge device deployment to effectively extract local significant features and global information features of tiny defects in steel. This method compresses the steel defect image into an image of a specified size. The image is input into a feature extraction network based on a U-shaped structure. The deformable convolution and attention modules of the feature extraction network are responsible for extracting features. The deformable convolution performs irregular convolution on the input steel defect image to extract local significant features. The attention mechanism uses the dot product method to obtain the context information features in the steel defects. At the same time, in order to retain some initial features of the steel defects, the extracted local significant features are fine-grainedly fused with the global context information features to obtain the final output features. The final output features are passed to the decoder part, which performs feature mapping and finally outputs the results of steel micro-defect detection. Finally, in order to facilitate the deployment of the feature extraction network on the edge device, a knowledge distillation learning framework is proposed. Through the learning of the teacher network, the objective function is continuously optimized to guide the student network.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

此处所说明的附图用来提供对本发明实施例的进一步理解，构成本申请的一部分，并不构成对本发明实施例的限定。在附图中：The drawings described herein are used to provide a further understanding of the embodiments of the present invention, constitute a part of this application, and do not constitute a limitation of the embodiments of the present invention. In the drawings:

图1一种适合边缘设备部署的钢材微小缺陷检测方法的流程图；FIG1 is a flow chart of a method for detecting minor defects in steel suitable for edge device deployment;

图2可变形与注意力特征融合提取模块；Figure 2 Deformable and attention feature fusion extraction module;

图3基于U形结构的特征提取网络；Fig. 3 Feature extraction network based on U-shaped structure;

图4蒸馏学习框架。Fig. 4 Distillation learning framework.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚明白，下面结合实施例和附图，对本发明作进一步的详细说明，本发明的示意性实施方式及其说明仅用于解释本发明，并不作为对本发明的限定。需要说明的是，本发明已经处于实际研发使用阶段。In order to make the purpose, technical scheme and advantages of the present invention more clearly understood, the present invention is further described in detail below in conjunction with the embodiments and drawings. The schematic implementation modes and descriptions of the present invention are only used to explain the present invention and are not intended to limit the present invention. It should be noted that the present invention is already in the actual development and use stage.

实施例1Example 1

如图1至图4所示，本实施例一种适合边缘设备部署的钢材微小缺陷检测方法：As shown in FIG. 1 to FIG. 4 , this embodiment is a method for detecting minor steel defects suitable for edge device deployment:

S1：获取待检测图像，具体实施步骤如下：S1: Get the image to be detected. The specific implementation steps are as follows:

(a)采用专业相机拍摄场景的实际图像，由此获得需要的钢材微小缺陷原始图像数据集；(a) Use a professional camera to capture the actual image of the scene, thereby obtaining the required raw image dataset of steel micro-defects;

(b)进行压缩处理，将图像处理为规定尺寸大小，作为网络输入图像；(b) compressing the image to a specified size and using it as a network input image;

(c)对压缩后的钢材微小缺陷图像数据集进行标注，用作后续训练的标签；(c) Annotate the compressed steel micro-defect image dataset and use it as a label for subsequent training;

采用专业相机拍摄钢材的实际图像，由此获得需要的钢材缺陷原始图像数据集，通过对原始图像数据集进行压缩处理，将图像处理为规定尺寸大小，同时对钢材缺陷的位置进行标注，方便后续网络模型进行训练。A professional camera is used to capture actual images of steel, thereby obtaining the required raw image dataset of steel defects. The raw image dataset is compressed and processed into a specified size. The locations of steel defects are marked to facilitate subsequent network model training.

S2：可变形卷积与注意力的特征提取模块进行特征提取，具体实施步骤如下：S2: The feature extraction module of deformable convolution and attention performs feature extraction. The specific implementation steps are as follows:

(a)将压缩后的钢材微小缺陷图像数据集输入到第一层卷积组件中，用于提取钢材表面缺陷的浅层特征F：(a) The compressed steel micro-defect image dataset is input into the first layer of convolutional components to extract the shallow features F of steel surface defects:

F＝ConvBlock(X)F = ConvBlock(X)

其中X表示钢材微小缺陷图像数据集输入；Where X represents the input of steel micro-defect image dataset;

(b)将处理后的钢材缺陷图像数据集输入到卷积组件中，用于提取钢材表面缺陷的浅层特征F，将浅层特征F输入由卷积块组成的偏移权重学习金字塔中，通过学习偏移权重矩阵W来调整可变形卷积的偏移量(ΔP_x,ΔP_y)；该过程可由公式(2)表示：(b) The processed steel defect image dataset is input into the convolution component to extract the shallow features F of the steel surface defects. The shallow features F are input into the offset weight learning pyramid composed of convolution blocks, and the offset (ΔP _x , ΔP _y ) of the deformable convolution is adjusted by learning the offset weight matrix W. This process can be expressed by formula (2):

(ΔP_x,ΔP_y)＝W(F)(ΔP _x ,ΔP _y )＝W(F)

(c)使用学习到的偏移量(ΔP_x,ΔP_y)来获取位于钢材缺陷不规则位置的新采样像素特征值l的位置p_l(x_l,y_l)(c) Use the learned offset (ΔP _x , ΔP _y ) to obtain the position p _l (x _l , y _l ) of the new sampled pixel feature value l located at the irregular position of the steel defect

(d)通过可变形卷积最终获得的特征向量为F₁，F₁的获取过程如公式(5)表示：(d) The feature vector finally obtained by deformable convolution is F ₁ . The acquisition process of F ₁ is expressed as formula (5):

F₁＝D_conv(O,W)F ₁ =D _conv (O,W)

其中，D表示可表形卷积，O表示可变形卷积学习到的偏移量，W表示可变形卷积学Among them, D represents the deformable convolution, O represents the offset learned by the deformable convolution, and W represents the deformable convolution learned.

习到的权重；The weight learned;

(e)可变形卷积引入了空间偏移量，而不考虑相邻像素之间的信息交互，为此，我们引入了注意力机制进行相邻像素之间的信息交互；将可变形卷积获得的特征F₁转换为两个向量，即注意力机制的键(K)和值(V)，同时取特征F₁中的像素记为p_q(x,y)，取p_q与p_k∈K转置的点积来计算特定参考像素与所有采样像素的相关性，获得注意力权重Atten_qk如公式(6)所示：(e) Deformable convolution introduces spatial offsets without considering the information interaction between adjacent pixels. To this end, we introduce an attention mechanism to interact with adjacent pixels. The feature _F1 obtained by the deformable convolution is converted into two vectors, namely the key (K) and value (V) of the attention mechanism. At the same time, the pixels in the feature _F1 are recorded as _pq (x,y). The dot product of _pq and _pk∈K is taken to calculate the correlation between a specific reference pixel and all sampled pixels. The attention weight _Attenqk is obtained as shown in formula (6):

(f)最后为了有选择地聚合钢材微小缺陷中的上下文信息并在全局视图中保留更多语义信息，通过将所有采样像素p_a与相应的注意力权重相加来提取钢材微小缺陷中的上下文信息，同时为保留钢材微小缺陷中某些初始的特征，将提取到的上下文信息与原始参考像素进行细粒度融合，该过程如公式(7)所示：(f) Finally, in order to selectively aggregate the contextual information in the tiny defects of steel and retain more semantic information in the global view, the contextual information in the tiny defects of steel is extracted by adding all the sampled pixels p _a with the corresponding attention weights. At the same time, in order to retain some initial features of the tiny defects of steel, the extracted contextual information is fine-grainedly fused with the original reference pixels. The process is shown in formula (7):

其中p_F表示为可变形卷积与注意力的特征提取模块最终输出的特征，p_a为所有采样像素，F₁为钢材缺陷某些初始的特征；Where p _F represents the final output feature of the feature extraction module of deformable convolution and attention, p _a represents all sampled pixels, and F ₁ represents some initial features of steel defects;

S3：基于U形结构的特征提取网络提取过程，具体实施步骤如下：S3: The feature extraction network extraction process based on the U-shaped structure, the specific implementation steps are as follows:

(a)将压缩后的钢材图像输入到网络中的第一个卷积块用于获取初步的浅层特征F′；(a) The compressed steel image is input into the first convolutional block in the network to obtain the preliminary shallow features F′;

(b)将初步的浅层特征F′依次输入到多个可变形卷积与注意力的特征提取模块用于提取局部特征与全局上下文特征F″；(b) The preliminary shallow features F′ are sequentially input into multiple deformable convolution and attention feature extraction modules to extract local features and global context features F″;

(c)将提取到的局部特征与全局上下文特征F″依次输入到多个解码器结构中对特征进行映射解码，最后通过线性层对钢材微小缺陷检测的结果进行输出；(c) The extracted local features and global context features F″ are sequentially input into multiple decoder structures to map and decode the features, and finally the results of steel micro-defect detection are output through a linear layer;

将处理好的钢材图像数据集输入到U形结构的特征提取网络中，通过卷积层和混合池化层对钢材图像进行整合全局空间特征信息，然后通过学习可变形卷积的偏移量完成对钢材微小缺陷的局部特征提取，通过注意力机制获取钢材缺陷周围的上下文信息特征；为了不使可变形卷积与注意力机制模块丢失一些原始的重要特征，将提取到的局部显著特征与钢材缺陷周围的全局上下文信息特征进行细粒度融合得到最终输出特征，最后利用解码器对最终输出的特征进行映射解码，通过解码得到最终钢材微小缺陷检测的结果。The processed steel image dataset is input into the U-shaped feature extraction network. The global spatial feature information of the steel image is integrated through the convolution layer and the mixed pooling layer. Then, the local feature extraction of the tiny defects of the steel is completed by learning the offset of the deformable convolution, and the contextual information features around the steel defects are obtained through the attention mechanism. In order to prevent the deformable convolution and attention mechanism modules from losing some original important features, the extracted local significant features are fine-grainedly fused with the global contextual information features around the steel defects to obtain the final output features. Finally, the decoder is used to map and decode the final output features, and the final results of steel tiny defect detection are obtained through decoding.

S4：基于知识蒸馏的学习架构对特征提取网络进行知识蒸馏，具体实施步骤如下：S4: Based on the knowledge distillation learning architecture, the feature extraction network is subjected to knowledge distillation. The specific implementation steps are as follows:

(a)将钢材缺陷数据集的图像分别输入到教师模型和学生模型中进行训练；(a) The images of the steel defect dataset are input into the teacher model and the student model for training respectively;

(b)在教师网络的训练过程中，由于需要对学生网络进行监督比较，会产生损失函数L_g，教师网络需要对损失函数L_g进行优化以不断完善对学生网络的指导，损失函数L_g如公式(8)所示：(b) During the training process of the teacher network, the loss function L _g is generated because of the need to supervise and compare the student network. The teacher network needs to optimize the loss function L _g to continuously improve the guidance of the student network. The loss function L _g is shown in formula (8):

其中K为教师网络生成的钢材缺陷检测框的长和宽，B为检测框的个数，M_q为在阈值范围内检测框的个数，s_ijc为学生网络模型生成的检测框位置的值，t_ijc为教师网络模型生成的检测框位置的值；Where K is the length and width of the steel defect detection box generated by the teacher network, B is the number of detection boxes, M _q is the number of detection boxes within the threshold range, s _ijc is the value of the detection box position generated by the student network model, and t _ijc is the value of the detection box position generated by the teacher network model;

(c)学生网络模型为U形结构的特征提取网络。在学生网络训练过程中，会产生钢材缺陷检测框预测值与检测框真实数据值之间的损失，记为L_st，L_st由三部分组成，如公式(9)所示：(c) The student network model is a feature extraction network with a U-shaped structure. During the student network training process, there will be a loss between the predicted value of the steel defect detection frame and the real data value of the detection frame, which is recorded as _Lst . _Lst consists of three parts, as shown in formula (9):

L_st＝L_obj+L_cla+L_box _Lst ＝ _Lobj + _Lcla + _Lbox

其中L_obj表示真实框置信度与学生模型输出网络的置信度之间的损失，L_cla表示真实的分类概率与学生模型输出网络的类别之间的损失，L_box表示真实的检测钢材缺陷的位置与学生模型输出的钢材缺陷的位置之间的损失；Where L _obj represents the loss between the true box confidence and the confidence of the student model output network, L _cla represents the loss between the true classification probability and the category of the student model output network, and L _box represents the loss between the true position of the detected steel defect and the position of the steel defect output by the student model;

(d)L_obj、L_cla和L_box具体表示如公式(10)、(11)和(12)所示：(d) L _obj , L _cla and L _box are specifically expressed as shown in formulas (10), (11) and (12):

其中K表示输入图像的长宽，B表示检测框的个数，表示检测框中存在钢材缺陷，表示检测框中不存在钢材缺陷，表示真实框的置信度，表示预测框的置信度，W_box表示位置回归损失的权重，L_MAE表示真实框和预测框位置的平均绝对误差；Where K represents the length and width of the input image, B represents the number of detection boxes, Indicates that there are steel defects in the detection frame. It means that there are no steel defects in the detection frame. represents the confidence of the real box, Represents the confidence of the predicted box, W _box represents the weight of the position regression loss, and L _MAE represents the mean absolute error between the true box and the predicted box position;

(e)为了知识蒸馏学习架构中的教师网络更好的指导学生网络进行学习，需要对教师网络的损失函数和学生网络的损失函数进行加权操作：(e) In order for the teacher network in the knowledge distillation learning architecture to better guide the student network to learn, it is necessary to perform weighted operations on the loss function of the teacher network and the loss function of the student network:

L_f＝μL_st+βL_g(μ+β＝1)L _f =μL _st +βL _g (μ + β = 1)

其中L_f表示蒸馏框架最终进行优化的损失函数，μ、β分别表示学生模型与教师模型之间损失的权重值；Where _Lf represents the loss function that the distillation framework finally optimizes, μ and β represent the weight values of the loss between the student model and the teacher model respectively;

利用知识蒸馏学习框架对基于U的特征提取网络模型进行蒸馏学习，便于在便于设备端进行部署；通过不断优化教师网络目标函数，达到教师网络学习到更多动钢材微小缺陷特征知识的目的，同时通过教师网络进行知识蒸馏达到对学生网络目标函数指导优化的目的。The knowledge distillation learning framework is used to perform distillation learning on the U-based feature extraction network model, which is convenient for deployment on the device side. By continuously optimizing the objective function of the teacher network, the teacher network can learn more knowledge about the characteristics of small defects in dynamic steel. At the same time, knowledge distillation is performed on the teacher network to guide the optimization of the student network objective function.

以上所述的具体实施方式，对本发明的目的、技术方案和有益效果进行了进一步详细说明，所应理解的是，以上所述仅为本发明的具体实施方式而已，并不用于限定本发明的保护范围，凡在本发明的精神和原则之内，所做的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The specific implementation methods described above further illustrate the objectives, technical solutions and beneficial effects of the present invention in detail. It should be understood that the above description is only a specific implementation method of the present invention and is not intended to limit the scope of protection of the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present invention should be included in the scope of protection of the present invention.

Claims

1. A method for detecting minor defects in steel suitable for edge device deployment, which mainly includes the following steps:

S1: Use a professional camera to capture the actual image of steel, thereby obtaining the required raw image dataset of steel defects. By compressing the raw image dataset, the image is processed into a specified size, and the location of the steel defects is marked.

S2: Input the processed steel defect image dataset into the convolution component to extract the shallow features F of the steel surface defects. Input the shallow features F into the offset weight learning pyramid composed of convolution blocks to adjust the offset of the deformable convolution by learning the offset weight matrix W.

Where (ΔP _x ,ΔP _y ) is the offset of the deformable convolution, W is the output of the offset weight learning pyramid, and F is the shallow feature of the input.

S3: Use the learned offset (ΔP _x , ΔP _y ) to obtain the position p _l (x _l , y _l ) of the new sampled pixel feature value l located at the irregular position of the steel defect;

p _l (x _l ,y _l )=p _g ((x+Δp _x ),(y+Δp _y ))

Where l represents the newly sampled pixel value, p _l (x _l ,y _l ) is the position of the pixel value l, p _g is the previous sampled pixel value, and p _g (x, y) is the position of the pixel value p _g .

S4: The feature _F1 obtained by deformable convolution introduces an offset without considering the information interaction between adjacent pixels. To this end, the attention mechanism is used to interact with adjacent pixels. The pixel in the feature _F1 is recorded as _pq (x,y), and the dot product of _pq and _pk∈K is taken to calculate the correlation between a specific reference pixel and all sampled pixels.

Where Atten _qi is used to measure the relevance weight of a specific reference pixel to the i-th sampling pixel, K is the set of sampling offsets, k is the number of the sampling pixel, and p _k is the position of any pixel.

S5: In order to selectively aggregate the contextual information in steel defects and retain more semantic information in the global view, the contextual information in steel defects is extracted by adding all sampled pixels p _a with the corresponding attention weights. At the same time, in order to retain some initial features of steel defects, the extracted contextual information is fine-grainedly fused and output with the original reference pixels;

Where _pF represents the final output feature of the feature extraction module of deformable convolution and attention, p _a represents all sampled pixels, and _F1 represents some initial features of steel defects.

S6: The features finally output by the deformable convolution and attention feature extraction module are input into the decoder for feature mapping and the final result of steel micro-target detection is obtained.

S7: Finally, in order to be suitable for deployment on edge devices, the distillation learning architecture composed of the teacher network model and the student network model is used for knowledge distillation. The teacher network is continuously trained to optimize the loss function L _g , thereby achieving optimization guidance for the student network loss function L _st ;

_Lst ＝ _Lobj + _Lcla + _Lbox

Where K is the length and width of the steel defect detection box generated by the teacher network, B is the number of detection boxes, M _q is the number of detection boxes within the threshold range, s _ijc is the value of the detection box position generated by the student network model, t _ijc is the value of the detection box position generated by the teacher network model, L _obj represents the loss between the true box confidence and the confidence of the student model output network, L _cla represents the loss between the true classification probability and the category of the student model output network, and L _box represents the loss between the true position of the detected steel defect and the position of the steel defect output by the student model.