CN115759200A

CN115759200A - A key path construction method for deep network based on iterative optimization of gate parameters

Info

Publication number: CN115759200A
Application number: CN202211461783.4A
Authority: CN
Inventors: 耿杰; 来加威; 张宇航; 邓鑫洋; 蒋雯
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2022-11-22
Filing date: 2022-11-22
Publication date: 2023-03-07
Anticipated expiration: 2042-11-22
Also published as: CN115759200B

Abstract

The invention discloses a gate parameter iterative optimization-based deep network key path construction method, which comprises the following steps of: step one, acquiring a data set; step two, constructing a deep convolution neural network model; adding gate parameters to the trained deep convolutional neural network model; and step four, constructing a critical path model of the depth network based on the gate parameters. The method has the advantages that the structure is simple, the design is reasonable, the importance of each convolution kernel is calculated based on the gate parameters, the convolution kernel with high contribution degree is selected to obtain the depth network key path model, when the constructed depth network key path model is repaired, only the gate parameters are updated and optimized, other parameters of the depth network model are not updated, and the parameters and the structure of the depth network model are not changed; the repairing reduces the covariate deviation and has good use effect.

Description

A key path construction method for deep network based on iterative optimization of gate parameters

技术领域technical field

本发明属于人工智能技术领域，具体涉及一种基于门参数迭代优化的深度网络关键路径构建方法。The invention belongs to the technical field of artificial intelligence, and in particular relates to a method for constructing a key path of a deep network based on iterative optimization of gate parameters.

背景技术Background technique

深度学习技术已经取得许多不错的成绩，但仍存在以下问题：1、深度学习卷积网络模型运行的网络结构庞大，带来了高额的存储空间、计算资源消耗，使其很难落实到各个硬件平台。目前主流的网络，如VGG16，参数量1亿3千多万，占用500多MB空间，需要进行300多亿次浮点运算才能完成一次图像识别任务。2、深度神经网络结构复杂而庞大，信息在网络中的传递演化规律不清晰，导致基于深度学习的智能算法可解释性差，智能决策的机理不明，使用者并不能清晰的知道深度学习卷积网络模型内部决策的过程以及决策的依据是什么，给实际应用带来潜在风险。因此近些年关于深度学习卷积网络模型透明性、可解释性、可信性的研究逐渐增多。Deep learning technology has achieved many good results, but there are still the following problems: 1. The network structure of the deep learning convolutional network model is huge, which brings high storage space and computing resource consumption, making it difficult to implement it in various Hardware platform. The current mainstream network, such as VGG16, has more than 130 million parameters and occupies more than 500 MB of space. It needs more than 30 billion floating-point operations to complete an image recognition task. 2. The structure of the deep neural network is complex and huge, and the law of information transfer and evolution in the network is not clear, resulting in poor interpretability of the intelligent algorithm based on deep learning, and the mechanism of intelligent decision-making is unclear, and users cannot clearly understand the deep learning convolutional network. The process of decision-making within the model and the basis for decision-making bring potential risks to practical applications. Therefore, in recent years, research on the transparency, interpretability, and credibility of deep learning convolutional network models has gradually increased.

针对深度学习网络结构与信息演变规律复杂的问题，本申请研究深度学习网络关键路径构建方法，分析深度网络的关键神经元节点和路径，量化神经元贡献度并简化深度网络，进而提升深度学习网络结构作用机理的可解释性。Aiming at the complex problem of deep learning network structure and information evolution law, this application studies the key path construction method of deep learning network, analyzes key neuron nodes and paths of deep network, quantifies neuron contribution and simplifies deep network, and then improves deep learning network Interpretability of structure mechanism of action.

发明内容Contents of the invention

本发明所要解决的技术问题在于针对上述现有技术中的不足，提供一种基于门参数迭代优化的深度网络关键路径构建方法，其结构简单、设计合理，基于门参数计算各个卷积核的重要度，选取贡献度大的卷积核获得深度网络关键路径模型，在对构建的深度网络关键路径模型进行修复时，仅对门参数进行更新优化，不对深度网络模型的其他参数进行更新，不改变深度网络模型的参数和结构；修复降低了协变量偏移，使用效果好。The technical problem to be solved by the present invention is to provide a method for constructing a key path of a deep network based on iterative optimization of gate parameters in view of the deficiencies in the above-mentioned prior art. Degree, select the convolution kernel with a large contribution to obtain the critical path model of the deep network. When repairing the constructed critical path model of the deep network, only the gate parameters are updated and optimized, and other parameters of the deep network model are not updated, and the depth is not changed. The parameters and structure of the network model; the fix reduces the covariate shift and works well.

为解决上述技术问题，本发明采用的技术方案是：一种基于门参数迭代优化的深度网络关键路径构建方法，其特征在于：包括以下步骤：In order to solve the above-mentioned technical problems, the technical solution adopted in the present invention is: a method for constructing a critical path of a deep network based on iterative optimization of gate parameters, characterized in that: comprising the following steps:

步骤一、获取数据集：采集多幅图像，获得数据集W＝(X,Y)，X表示输入数据,Y表示数据标签,将数据集W按比例划分为训练集、验证集和测试集；Step 1. Obtain a data set: collect multiple images, obtain a data set W=(X, Y), where X represents the input data, Y represents the data label, and divide the data set W into a training set, a verification set and a test set in proportion;

步骤二、构建深度卷积神经网络模型：定义深度卷积神经网络模型目标函数，将训练集作为深度卷积神经网络模型输入，求解深度卷积神经网络模型最优参数从而完成深度卷积神经网络模型训练；Step 2. Build a deep convolutional neural network model: define the objective function of the deep convolutional neural network model, use the training set as the input of the deep convolutional neural network model, and solve the optimal parameters of the deep convolutional neural network model to complete the deep convolutional neural network model training;

步骤三、在训练好的深度卷积神经网络模型上加入门参数：Step 3. Add gate parameters to the trained deep convolutional neural network model:

步骤301、在训练好的深度卷积神经网络模型各层的卷积核上分别加入门参数φ_i,j，φ_i,j表示加在第i个卷积层第j个卷积核上的门参数，所述门参数φ_i,j为(1，C_i,j，1，1),C_i,j表示门参数φ_i,j的图像通道数，1≤i≤n，n表示网络层数，1≤j≤m，m表示第i层的卷积核个数；Step 301, adding gate parameters φ _i,j to the convolution kernels of each layer of the trained deep convolutional neural network model, φ _i,j represents the value added to the jth convolution kernel of the i convolutional layer Gate parameter, the gate parameter φ _{i, j} is (1, C _{i, j} , 1, 1), C _{i, j} represents the number of image channels of the gate parameter φ _{i, j} , 1≤i≤n, n represents the network Number of layers, 1≤j≤m, m represents the number of convolution kernels of the i-th layer;

步骤302、利用训练集对加入门参数φ_i,j的深度卷积神经网络模型进行再次训练，利用验证集对加入门参数φ_i,j的深度卷积神经网络模型进行验证，得到重新训练好的深度卷积神经网络模型以及门参数φ_i,j；Step 302: Use the training set to retrain the deep convolutional neural network model with gate parameters φ _i,j , use the verification set to verify the deep convolutional neural network model with gate parameters φ _i,j , and obtain the retrained The deep convolutional neural network model and gate parameters φ _i,j ;

步骤四、基于门参数构建深度网络的关键路径模型：Step 4. Construct the critical path model of the deep network based on the gate parameters:

步骤401、基于门参数计算各个卷积核的重要度：计算机根据公式Θ(φ_i,j)＝KL(ΔL(φ_i,j)||ΔL_Ω(0))计算重新训练好的深度卷积神经网络模型第i层网络第j个卷积核的重要度Θ(φ_i,j)，其中KL(·)表示KL散度计算，ΔL(φ_i,j)表示步骤三中重新训练好的深度卷积神经网络模型的预测值与真实值的差值，ΔL_Ω(0)表示步骤二中完成训练的深度卷积神经网络模型的预测值

与真实值Y的差值，Ω表示除了φ_i,j以外的网络模型参数；Step 401. Calculate the importance of each convolution kernel based on the gate parameters: the computer calculates the retrained depth convolution according to the formula Θ(φ _i,j )=KL(ΔL(φ _i,j )||ΔL _Ω (0)) The importance degree Θ(φ _i,j ) of the jth convolution kernel in the i-th layer of the product neural network model, where KL( ) represents the KL divergence calculation, and ΔL(φ _i,j ) represents the re-training in step 3 The difference between the predicted value of the deep convolutional neural network model and the real value, ΔL _Ω (0) represents the predicted value of the deep convolutional neural network model trained in step 2

The difference from the real value Y, Ω represents the network model parameters other than φ _i,j ;

步骤402，利用阈值对卷积核重要度低的门参数置零：若Θ(φ_i,j)＜Θ(φ)，则将门参数φ_i,j赋值为0，Θ(φ)表示优化阈值，得到深度网络关键路径模型；Step 402, use the threshold to zero the gate parameters with low importance of the convolution kernel: if Θ(φ _i,j )<Θ(φ), assign the gate parameters φ _i,j to 0, and Θ(φ) represents the optimized threshold , to obtain the critical path model of the deep network;

步骤403、对构建的深度网络关键路径模型进行修复：将训练集输入深度网络关键路径模型进行迭代训练，通过梯度下降算法对深度网络关键路径模型的门参数进行更新优化，置零的门参数一直置零，不参与更新，门参数迭代训练过程中深度网络关键路径模型的损失函数为L'＝(1-α)L₁+αL₂，其中

α表示权重,X_k表示训练集中的第k个数据，Y_k表示X_k对应的样本标签，1≤k≤N，N表示训练集的样本数，Δ表示标签平滑因子，p(X_k,θ^-)表示训练样本X_k的预测概率，θ^-表示关键路径模型的参数；Step 403. Repair the constructed deep network critical path model: input the training set into the deep network critical path model for iterative training, and update and optimize the gate parameters of the deep network critical path model through the gradient descent algorithm. The zeroed gate parameters are always Set to zero, do not participate in the update, the loss function of the deep network critical path model in the iterative training process of the gate parameters is L'=(1-α)L ₁ +αL ₂ , where

α represents the weight, X _k represents the kth data in the training set, Y _k represents the sample label corresponding to X _k , 1≤k≤N, N represents the number of samples in the training set, Δ represents the label smoothing factor, p(X _k , θ ^- ) represents the predicted probability of the training sample X _k , θ ^- represents the parameters of the critical path model;

步骤404、利用验证集对步骤403中完成更新优化的深度网络关键路径模型进行验证，得到修复的深度网络关键路径模型；Step 404: Use the verification set to verify the updated and optimized deep network critical path model in step 403 to obtain a repaired deep network critical path model;

步骤405、将测试集输入修复的深度网络关键路径模型，得到测试集的分类结果。Step 405, input the test set into the repaired deep network critical path model, and obtain the classification result of the test set.

上述的一种基于门参数迭代优化的深度网络关键路径构建方法，其特征在于：步骤一中，图像为雷达图像，对采集到的雷达图像首先提取前景图，然后进行高斯模糊处理，再提取特征向量，获得数据集W＝(X,Y)。The above-mentioned method for constructing a critical path of a deep network based on iterative optimization of gate parameters is characterized in that: in step 1, the image is a radar image, and the foreground image is first extracted from the collected radar image, and then Gaussian blur processing is performed, and then the feature is extracted Vector, get the data set W=(X,Y).

上述的一种基于门参数迭代优化的深度网络关键路径构建方法，其特征在于：步骤302中，加入门参数φ_i,j的网络模型的损失函数

其中L(X，Y:θ)表示交叉熵损失函数，θ表示网络模型的参数，

表示门参数的稀疏约束，λ表示稀疏约束的权重。The above-mentioned method for constructing a critical path of a deep network based on iterative optimization of gate parameters is characterized in that: in step 302, the loss function of the network model of the gate parameters φ _i,j is added

Where L(X, Y:θ) represents the cross-entropy loss function, θ represents the parameters of the network model,

Denotes the sparse constraint on the gate parameters, and λ denotes the weight of the sparse constraint.

上述的一种基于门参数迭代优化的深度网络关键路径构建方法，其特征在于：步骤401中，

L表示损失函数，δL表示损失函数L的增量，δφ_i,j表示门参数φ_i,j的增量，

表示样本预测结果，Y表示样本标签。The above-mentioned method for constructing a critical path of a deep network based on iterative optimization of gate parameters is characterized in that: in step 401,

L represents the loss function, δL represents the increment of the loss function L, and δφ _i,j represents the increment of the gate parameter φ _i,j ,

Indicates the sample prediction result, and Y indicates the sample label.

本发明与现有技术相比具有以下优点：Compared with the prior art, the present invention has the following advantages:

1、本发明的结构简单、设计合理，实现及使用操作方便。1. The structure of the present invention is simple, the design is reasonable, and the realization, use and operation are convenient.

2、本发明在训练好的深度卷积神经网络模型上加入门参数，基于门参数计算各个卷积核的重要度，获得贡献度大、信息交互强的卷积核节点，能够提升深度网络结构作用机理的可解释性，选取贡献度大的卷积核能够获得深度网络的关键路径模型，起到简化深度网络的作用。2. The present invention adds gate parameters to the trained deep convolutional neural network model, calculates the importance of each convolution kernel based on the gate parameters, and obtains convolution kernel nodes with large contribution and strong information interaction, which can improve the deep network structure The interpretability of the mechanism of action, the selection of a convolution kernel with a large contribution can obtain the critical path model of the deep network, and play a role in simplifying the deep network.

3、本发明在对构建的深度网络关键路径模型进行修复时，通过梯度下降算法仅对深度网络关键路径模型的门参数进行更新优化，置零的门参数一直置零，不参与更新，且不对深度网络模型的其他参数进行更新，不改变深度网络模型的参数和结构。3. When the present invention repairs the constructed deep network critical path model, only the gate parameters of the deep network critical path model are updated and optimized through the gradient descent algorithm. The zeroed gate parameters are always set to zero, do not participate in the update, and do not correct Other parameters of the deep network model are updated without changing the parameters and structure of the deep network model.

4、本发明在对网络模型进行修复训练时，采用两个损失函数的叠加构造新的损失函数，损失函数L₁加入门参数φ_i,j的约束条件，损失函数L₂引入标签平滑因子，对标签对平滑处理，降低了标签中的噪声影响，提高了标签对数据的正确率，进一步降低了协变量偏移，提高了网络模型的识别精度。4. The present invention adopts the superposition of two loss functions to construct a new loss function when the network model is repaired and trained, and the loss function L ₁ adds the constraint conditions of the gate parameters φ _i,j , and the loss function L ₂ introduces a label smoothing factor, The label pair smoothing process reduces the influence of noise in the label, improves the accuracy of the label to data, further reduces the covariate offset, and improves the recognition accuracy of the network model.

综上所述，本发明结构简单、设计合理，基于门参数计算各个卷积核的重要度，选取贡献度大的卷积核获得深度网络关键路径模型，在对构建的深度网络关键路径模型进行修复时，仅对门参数进行更新优化，不对深度网络模型的其他参数进行更新，不改变深度网络模型的参数和结构；修复降低了协变量偏移，使用效果好。To sum up, the present invention has a simple structure and a reasonable design. The importance of each convolution kernel is calculated based on the gate parameters, and the convolution kernel with a large contribution is selected to obtain the critical path model of the deep network. When repairing, only the gate parameters are updated and optimized, and other parameters of the deep network model are not updated, and the parameters and structure of the deep network model are not changed; the repair reduces the covariate offset, and the use effect is good.

下面通过附图和实施例，对本发明的技术方案做进一步的详细描述。The technical solutions of the present invention will be described in further detail below with reference to the accompanying drawings and embodiments.

附图说明Description of drawings

图1为本发明的电路原理框图。Fig. 1 is the block diagram of circuit principle of the present invention.

具体实施方式Detailed ways

下面结合附图及本发明的实施例对本发明的方法作进一步详细的说明。The method of the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments of the present invention.

需要说明的是，在不冲突的情况下，本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本发明。It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other. The present invention will be described in detail below with reference to the accompanying drawings and examples.

需要注意的是，这里所使用的术语仅是为了描述具体实施方式，而非意图限制根据本申请的示例性实施方式。如在这里所使用的，除非上下文另外明确指出，否则单数形式也意图包括复数形式，此外，还应当理解的是，当在本说明书中使用术语“包含”和/或“包括”时，其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terminology used here is only for describing specific implementations, and is not intended to limit the exemplary implementations according to the present application. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural, and it should also be understood that when the terms "comprising" and/or "comprising" are used in this specification, they mean There are features, steps, operations, means, components and/or combinations thereof.

需要说明的是，本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本申请的实施方式例如能够以除了在这里图示或描述的那些以外的顺序实施。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first" and "second" in the description and claims of the present application and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein, for example, can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a sequence of steps or elements is not necessarily limited to the expressly listed instead, may include other steps or elements not explicitly listed or inherent to the process, method, product or apparatus.

为了便于描述，在这里可以使用空间相对术语，如“在……之上”、“在……上方”、“在……上表面”、“上面的”等，用来描述如在图中所示的一个器件或特征与其他器件或特征的空间位置关系。应当理解的是，空间相对术语旨在包含除了器件在图中所描述的方位之外的在使用或操作中的不同方位。例如，如果附图中的器件被倒置，则描述为“在其他器件或构造上方”或“在其他器件或构造之上”的器件之后将被定位为“在其他器件或构造下方”或“在其他器件或构造之下”。因而，示例性术语“在……上方”可以包括“在……上方”和“在……下方”两种方位。该器件也可以其他不同方式定位(旋转90度或处于其他方位)，并且对这里所使用的空间相对描述作出相应解释。For the convenience of description, spatially relative terms may be used here, such as "on ...", "over ...", "on the surface of ...", "above", etc., to describe the The spatial positional relationship between one device or feature shown and other devices or features. It will be understood that the spatially relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures. For example, if the device in the figures is turned over, devices described as "above" or "above" other devices or configurations would then be oriented "beneath" or "above" the other devices or configurations. under other devices or configurations”. Thus, the exemplary term "above" can encompass both an orientation of "above" and "beneath". The device may be otherwise oriented (rotated 90 degrees or at other orientations) and the spatially relative descriptions used herein interpreted accordingly.

如图1所示，本发明包括以下步骤：As shown in Figure 1, the present invention comprises the following steps:

步骤一、获取数据集：采集多幅图像，获得数据集W＝(X,Y)，X表示输入数据,Y表示数据标签,将数据集W按比例划分为训练集、验证集和测试集。Step 1. Obtain a data set: collect multiple images to obtain a data set W=(X, Y), where X represents the input data and Y represents the data label, and the data set W is divided into training set, verification set and test set in proportion.

实际使用时，图像为根据应用场景采集雷达图像，或直接采用MSTAR数据集，MSTAR数据集在雷达图像目标识别研究上得到广泛应用。In actual use, the image is a radar image collected according to the application scene, or the MSTAR dataset is directly used. The MSTAR dataset is widely used in the research of radar image target recognition.

对采集到的雷达图像首先提取前景图，然后进行高斯模糊处理，再提取特征向量，获得的特征向量复杂度变低，去除产生偏移的且不重要的特征，减少输入信息的复杂度，提高样本对协变量的分布均匀率，提高模型效果。For the collected radar image, first extract the foreground image, then perform Gaussian blur processing, and then extract the feature vector, the complexity of the obtained feature vector becomes lower, removes the offset and unimportant features, reduces the complexity of the input information, and improves The distribution uniformity rate of the sample to the covariate improves the effect of the model.

特征向量被保存或存储在电子或计算机可读存储介质上，和/或被传输到另一个程序或应用以供进一步处理或使用。The feature vectors are saved or stored on an electronic or computer readable storage medium and/or transmitted to another program or application for further processing or use.

步骤二、构建深度卷积神经网络模型：定义深度卷积神经网络模型目标函数，将训练集作为深度卷积神经网络模型输入，求解深度卷积神经网络模型最优参数从而完成深度卷积神经网络模型训练。Step 2. Build a deep convolutional neural network model: define the objective function of the deep convolutional neural network model, use the training set as the input of the deep convolutional neural network model, and solve the optimal parameters of the deep convolutional neural network model to complete the deep convolutional neural network Model training.

需要说明的是，卷积神经网络模型可选取vgg模型、resnet-34模型、resnet-50或resnet-56模型。下文中的深度学习神经网络模型、深度神经网络模型、神经网络模型、网络模型、深度网络均指深度卷积神经网络模型。It should be noted that the convolutional neural network model can choose vgg model, resnet-34 model, resnet-50 or resnet-56 model. The following deep learning neural network models, deep neural network models, neural network models, network models, and deep networks all refer to deep convolutional neural network models.

步骤301、在训练好的深度卷积神经网络模型各层的卷积核上分别加入门参数φ_i,j，φ_i,j表示加在第i个卷积层第j个卷积核上的门参数，所述门参数φ_i,j为(1，C_i，1，1),C_i表示门参数φ_i,j的图像通道数，1≤i≤n，n表示网络层数，1≤j≤m，m表示第i层的卷积核个数。Step 301, adding gate parameters φ _i,j to the convolution kernels of each layer of the trained deep convolutional neural network model, φ _i,j represents the value added to the jth convolution kernel of the i convolutional layer Gate parameter, the gate parameter φ _{i, j} is (1, C _i , 1, 1), C _i represents the number of image channels of the gate parameter φ _{i, j} , 1≤i≤n, n represents the number of network layers, 1 ≤j≤m, m represents the number of convolution kernels in the i-th layer.

门参数是一个梯度可更新的变量。卷积层的作用是提取图像的特征，卷积层的输入数据称为输入特征图，输出数据称为输出特征图，门参数的设置，对卷积层的输出乘以一个门参数，作为下一层卷积层的输入，用于识别卷积核的属性，改变了标准神经网络模型的结构。所述门参数φ_i,j的参数为(1，C_i,j，1，1),C_i,j表示第i个门参数的图像通道数。实际使用时，如果是针对神经网络模型的全局简化，则对每一层卷积层后面都加入门参数φ_i,j，且C_i,j≠1。如果是针对神经网络模型的部分简化，则不需要简化的卷积层的C_i,j＝1。The gate parameter is a gradient-updatable variable. The function of the convolutional layer is to extract the features of the image. The input data of the convolutional layer is called the input feature map, and the output data is called the output feature map. For the setting of the gate parameter, the output of the convolutional layer is multiplied by a gate parameter, as the following The input of a convolutional layer, which is used to identify the properties of the convolutional kernel, changes the structure of the standard neural network model. The parameter of the gate parameter φ _i,j is (1, C _i,j ,1,1), and C _i,j represents the number of image channels of the i-th gate parameter. In actual use, if it is for the global simplification of the neural network model, the gate parameter φ _i,j is added after each convolutional layer, and C _i,j ≠1. If it is for partial simplification of the neural network model, C _i,j =1 of the simplified convolutional layer is not required.

步骤302、利用训练集对加入门参数φ_i,j的深度卷积神经网络模型进行再次训练，实际使用时，加入门参数φ_i,j的网络模型的损失函数

其中L(X，Y:θ)表示交叉熵损失函数，θ表示网络模型的参数，

表示门参数的稀疏约束，λ表示稀疏约束的权重。Step 302, use the training set to retrain the deep convolutional neural network model with gate parameters φ _i,j , in actual use, the loss function of the network model with gate parameters φ _i,j

训练完成后，固定卷积核参数，利用验证集对加入门参数φ_i,j的深度卷积神经网络模型进行验证，得到重新训练好的深度卷积神经网络模型以及门参数φ_i,j。After the training is completed, the parameters of the convolution kernel are fixed, and the verification set is used to verify the deep convolutional neural network model with gate parameters φ _i,j, and the retrained deep convolutional neural network model and gate parameters φ _i,j are obtained.

实际使用时，利用训练集对加入门参数φ_i,j的深度卷积神经网络模型进行稀疏训练，稀疏约束将各个卷积核之间的依赖性变低，更为独立，便于进一步解释卷积核的贡献度。在训练时，只对门参数φ_i,j进行更新，每次使用训练数据的一个子集来训练神经网络模型。In actual use, use the training set to perform sparse training on the deep convolutional neural network model with gate parameters φ _i,j . The sparse constraints will reduce the dependence between each convolution kernel and make it more independent, which is convenient for further explanation of convolution. contribution of the nucleus. During training, only the gate parameters φ _i,j are updated, and a subset of training data is used to train the neural network model each time.

与真实值Y的差值，Ω表示除了φ_i,j以外的网络模型参数。Step 401. Calculate the importance of each convolution kernel based on the gate parameters: the computer calculates the retrained depth convolution according to the formula Θ(φ _i,j )=KL(ΔL(φ _i,j )||ΔL _Ω (0)) The importance Θ(φ _i,j ) of the jth convolution kernel in the i-th layer of the product neural network model, where KL( ) represents the KL divergence calculation, and ΔL(φ _i,j ) represents the re-trained in step 3 The difference between the predicted value of the deep convolutional neural network model and the real value, ΔL _Ω (0) represents the predicted value of the deep convolutional neural network model trained in step 2

The difference from the true value Y, Ω represents the network model parameters other than φ _i,j .

在深度卷积神经网络中，卷积核的作用是提取输入雷达数据的特征。对于负责提取某一特征的卷积核，若其从上一层输入中检测到其对应的特征，则该节点被激活，允许有用的特征信息向下一层传播。而如果此节点认为输入中不包含其对应的特征，则该节点将会被屏蔽，阻断无用特征信息向下一层传播。因此，基于门参数计算各个卷积核的贡献度，获得贡献度大、信息交互强的卷积核节点，有利于实现对深度网络的可解释性，实现关键路径模型的构建。In a deep convolutional neural network, the role of the convolution kernel is to extract the features of the input radar data. For the convolution kernel responsible for extracting a certain feature, if it detects its corresponding feature from the input of the previous layer, the node is activated, allowing useful feature information to propagate to the next layer. And if the node thinks that the input does not contain its corresponding features, the node will be shielded to block the propagation of useless feature information to the next layer. Therefore, calculating the contribution degree of each convolution kernel based on the gate parameters and obtaining convolution kernel nodes with large contribution and strong information interaction is conducive to realizing the interpretability of the deep network and realizing the construction of the critical path model.

基于门参数φ_i,j对深度网络每层每个卷积核的贡献度进行计算，贡献度作为判断该卷积核是否为关键节点的依据。实际使用时，

表示样本预测结果，Y表示样本标签。Based on the gate parameters φ _i,j, the contribution degree of each convolution kernel of each layer of the deep network is calculated, and the contribution degree is used as the basis for judging whether the convolution kernel is a key node. In actual use,

Indicates the sample prediction result, and Y indicates the sample label.

步骤402，利用阈值对卷积核重要度低的门参数置零：若Θ(φ_i,j)＜Θ(φ)，则将门参数φ_i,j赋值为0，Θ(φ)表示优化阈值，得到关键路径模型。实际使用时，设置优化阈值Θ(φ)，对卷积核进行重要性筛选，贡献度低于预设的优化阈值Θ(φ)的卷积核，可以将之视为“不关键”的节点，因此将其对应的门参数φ_i,j赋值为0，对门参数赋值为0，相当于将对应的卷积核进行删除。剩余卷积核为对处理图像信息起关键作用的卷积核，剩余的卷积核构成深度网络的关键路径模型。关键路径模型可用于分析信息在深度网络的关键路径的演变情况，从而完成了对深度网络的可解释阐述。Step 402, use the threshold to zero the gate parameters with low importance of the convolution kernel: if Θ(φ _i,j )<Θ(φ), assign the gate parameters φ _i,j to 0, and Θ(φ) represents the optimized threshold , to get the critical path model. In actual use, set the optimization threshold Θ(φ), filter the importance of the convolution kernel, and the convolution kernel whose contribution is lower than the preset optimization threshold Θ(φ) can be regarded as an "uncritical" node , so the corresponding gate parameter φ _i,j is assigned a value of 0, and the gate parameter is assigned a value of 0, which is equivalent to deleting the corresponding convolution kernel. The remaining convolution kernel is the convolution kernel that plays a key role in processing image information, and the remaining convolution kernel constitutes the critical path model of the deep network. The critical path model can be used to analyze the evolution of information in the critical path of the deep network, thus completing the interpretable elaboration of the deep network.

α表示权重,X_k表示训练集中的第k个数据，Y_k表示X_k对应的样本标签，1≤k≤N，N表示训练集的样本数，Δ表示标签平滑因子，p(X_k,θ^-)表示训练样本X_k的预测概率，θ^-表示关键路径模型的参数。Step 403. Repair the constructed deep network critical path model: input the training set into the deep network critical path model for iterative training, and update and optimize the gate parameters of the deep network critical path model through the gradient descent algorithm. The zeroed gate parameters are always Set to zero, do not participate in the update, the loss function of the deep network critical path model in the iterative training process of the gate parameters is L'=(1-α)L ₁ +αL ₂ , where

α represents the weight, X _k represents the kth data in the training set, Y _k represents the sample label corresponding to X _k , 1≤k≤N, N represents the number of samples in the training set, Δ represents the label smoothing factor, p(X _k , θ ^- ) represents the predicted probability of the training sample X _k , θ ^- represents the parameters of the critical path model.

实际使用时，将迭代训练的损失值反向传播，并提取反向传播过程中深度网络可解释性关键路径模型权重的梯度，根据所提取梯度，通过梯度下降算法仅对深度网络关键路径模型的门参数进行更新优化，不对神经网络模型的其他参数进行更新。In actual use, the loss value of the iterative training is backpropagated, and the gradient of the weight of the critical path model of the deep network interpretability is extracted during the backpropagation process. According to the extracted gradient, only the critical path model of the deep network The gate parameters are updated and optimized, and other parameters of the neural network model are not updated.

需要说明的是，对原始的深度网络模型的卷积核删减，很可能带来信息损失，使得优化后的神经网络模型的协变量偏移，导致模型效果变差，因此在修复训练时，考虑对门参数φ_i,j的优化，所以在损失函数L₁中加入门参数φ_i,j的约束条件。It should be noted that the deletion of the convolution kernel of the original deep network model is likely to cause information loss, which will cause the covariate shift of the optimized neural network model, resulting in a poor model effect. Therefore, when repairing training, Consider the optimization of the gate parameters φ _i,j , so the constraints of the gate parameters φ _i,j are added to the loss function L ₁ .

损失函数L₂能够衡量X_k的预测概率与真实概率的差异，在网络模型中真实概率分布与预测概率分布之间的差异就表现为损失，损失越小，网络模型的预测分类效果越好。Δ＝0.05，对标签对平滑处理，降低了标签中的噪声影响，提高了标签对数据的正确率，进一步降低了协变量偏移，提高了网络模型的识别精度。The loss function L ₂ can measure the difference between the predicted probability and the real probability of X _k . In the network model, the difference between the real probability distribution and the predicted probability distribution is expressed as a loss. The smaller the loss, the better the prediction classification effect of the network model. Δ=0.05, the label pairs are smoothed, which reduces the influence of noise in the labels, improves the correct rate of labels to data, further reduces the covariate shift, and improves the recognition accuracy of the network model.

本发明的有效性可以通过以下仿真实验进一步证实：Effectiveness of the present invention can further confirm by following simulation experiments:

1、实验条件与方法1. Experimental conditions and methods

硬件平台为：Inter(R)Core(TM)i5-10600K CPU@4.10GHZ、The hardware platform is: Inter(R)Core(TM)i5-10600K CPU@4.10GHZ,

16.0GB RAM；16.0GB RAM;

软件平台为：Pytorch 1.10；The software platform is: Pytorch 1.10;

2、仿真内容与结果2. Simulation content and results

本实施例中，深度网络选用VGG16神经网络模型。VGG16神经网络模型可以用于图像处理，包括雷达图像信息的处理，图像处理被用于各种各样的技术应用中，作为示例，可用于图像中的特定种类或类型的对象的定位，例如在用于图像和视频分析的计算机视觉应用中，本实施例不做具体限制。In this embodiment, the deep network selects the VGG16 neural network model. The VGG16 neural network model can be used for image processing, including the processing of radar image information. Image processing is used in a variety of technical applications. As an example, it can be used for the location of specific types or types of objects in images, such as in In computer vision applications for image and video analysis, this embodiment is not specifically limited.

表1为深度网络分为选用VGG16模型时，原深度网络和深度网络关键路径模型的各层卷积核节点数、分类精度对照表。Table 1 is a comparison table of the number of convolution kernel nodes and classification accuracy of each layer of the original deep network and the critical path model of the deep network when the deep network is divided into the VGG16 model.

表1原深度网络和深度网络关键路径模型的对照表Table 1 Comparison table between the original deep network and the critical path model of the deep network

例如第一层，原始卷积核数量为64个，通过贡献度筛选，只剩下51个对于网络性能起作用的卷积核。从结果可以看出，深度网络关键路径模型只保留了原神经网络模型的1452个卷积核，也就是仅保留了34.37％的节点数。简化修复以后，深度网络关键路径模型在数据集上的识别正确率仍有92.46％，关键路径模型在性能上与原深度网络模型十分接近，说明了深度网络关键路径模型的可靠性，验证了本申请的构建方法的有效性。For example, in the first layer, the number of original convolution kernels is 64, and only 51 convolution kernels that contribute to network performance are left after the contribution degree screening. It can be seen from the results that the deep network critical path model only retains 1452 convolution kernels of the original neural network model, that is, only 34.37% of the number of nodes is retained. After the simplification and repair, the correct recognition rate of the deep network critical path model on the data set is still 92.46%, and the performance of the critical path model is very close to the original deep network model, which shows the reliability of the deep network critical path model and verifies the The validity of the application's build method.

而且，从各层卷积核节点数的变化可以看到，不管是深度网络关键路径模型还是原深度网络模型，第一层卷积核节点数较多，便于从输入样本中提取出足够的信息用于分类；最后一层卷积核节点数也较多，是为了给最终的分类提供足够的信息。相比之下，中间层的卷积核节点数都偏少，说明在特征在从输入到输出的传递过程中，中间层节对信息的传递与提取贡献度较低，神经网络在进行任务判别时所需的高级、抽象特征较少，所以中间层节点存在大量冗余，利用关键路径可以解释信号在网络模型的演变与传递过程，进而提升深度学习网络结构作用机理的可解释性。Moreover, from the changes in the number of convolution kernel nodes in each layer, it can be seen that no matter it is the deep network critical path model or the original deep network model, the number of convolution kernel nodes in the first layer is large, which is convenient for extracting enough information from the input samples. It is used for classification; the number of convolution kernel nodes in the last layer is also large, in order to provide enough information for the final classification. In contrast, the number of convolution kernel nodes in the middle layer is relatively small, indicating that during the transfer of features from input to output, the contribution of middle layer nodes to information transfer and extraction is low, and the neural network is performing task discrimination. The high-level and abstract features required by the system are less, so there is a lot of redundancy in the middle layer nodes. Using the key path can explain the evolution and transmission process of the signal in the network model, thereby improving the interpretability of the mechanism of the deep learning network structure.

应理解，上述实施例中各步骤的序号的大小并不意味着执行顺序的先后，各过程的执行顺序应以其功能和内在逻辑确定，而不应对本申请实施例的实施过程构成任何限定。It should be understood that the sequence numbers of the steps in the above embodiments do not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiment of the present application.

以上所述，仅是本发明的实施例，并非对本发明作任何限制，凡是根据本发明技术实质对以上实施例所作的任何简单修改、变更以及等效结构变化，均仍属于本发明技术方案的保护范围内。The above is only an embodiment of the present invention, and does not limit the present invention in any way. Any simple modifications, changes and equivalent structural changes made to the above embodiments according to the technical essence of the present invention still belong to the technical solution of the present invention. within the scope of protection.

Claims

1. A deep network key path construction method based on gate parameter iterative optimization is characterized by comprising the following steps: the method comprises the following steps:

step one, acquiring a data set: acquiring a plurality of images to obtain a data set W = (X, Y), wherein X represents input data, Y represents a data label, and the data set W is divided into a training set, a verification set and a test set in proportion;

step two, constructing a deep convolution neural network model: defining a target function of a deep convolutional neural network model, inputting a training set as the deep convolutional neural network model, and solving the optimal parameters of the deep convolutional neural network model so as to complete the training of the deep convolutional neural network model;

adding gate parameters to the trained deep convolutional neural network model:

301, respectively adding a gate parameter phi to the convolution kernels of each layer of the trained deep convolution neural network model _i,j ，φ _i,j Representing the gate parameter added to the ith convolution layer and the jth convolution kernel, said gate parameter phi _i,j Is (1,C) _i,j ，1，1),C _i,j Representing the door parameter phi _i,j I is more than or equal to 1 and less than or equal to n, n represents the number of network layers, j is more than or equal to 1 and less than or equal to m, and m represents the number of convolution kernels of the ith layer;

step 302, add door parameter phi by training set pair _i,j The deep convolutional neural network model is trained again, and the verification set pair is used for adding the gate parameter phi _i,j The deep convolutional neural network model is verified to obtain a retrained deep convolutional neural network model and a gate parameter phi _i,j ；

Step four, constructing a critical path model of the depth network based on the gate parameters:

step 401, calculating the importance of each convolution kernel based on the gate parameters: the computer follows the formula theta (phi) _i,j )＝KL(ΔL(φ _i,j )||ΔL _Ω (0) Calculates the importance theta (phi) of the jth convolution kernel of the ith layer network of the retrained deep convolution neural network model _i,j ) Where KL (. Cndot.) represents the KL divergence calculation, Δ L (. Phi.) _i,j ) The difference value of the predicted value and the true value of the depth convolution neural network model retrained in the third step, delta L _Ω (0) Representing the predicted value of the deep convolutional neural network model which completes the training in the second step

Difference from the true value Y, Ω represents the difference except φ _i,j (ii) network model parameters;

step 402, zeroing the gate parameter with low importance of the convolution kernel by using a threshold value: if theta (phi) _i,j ) < theta (phi), the gate parameter phi is set _i,j The value is assigned to be 0, and theta (phi) represents an optimization threshold value to obtain a deep network key path model;

step 403, repairing the built deep network critical path model: inputting a training set into a deep network key path model for iterative training, updating and optimizing gate parameters of the deep network key path model through a gradient descent algorithm, setting the zero gate parameters to zero all the time without participating in updating, wherein the loss function of the deep network key path model in the gate parameter iterative training process is L' = (1-alpha) L ₁ +αL ₂ Wherein

Alpha represents a weight, X _k Representing the kth data in the training set, Y _k Represents X _k Corresponding sample label, k is more than or equal to 1 and less than or equal to N, N represents the number of samples of the training set, delta represents the label smoothing factor, and p (X) _k ,θ ^- ) Representing training sample X _k Predicted probability of (a), theta ^- Parameters representing a critical path model;

step 404, verifying the deep network key path model which is updated and optimized in the step 403 by using a verification set to obtain a repaired deep network key path model;

and 405, inputting the test set into the repaired deep network key path model to obtain a classification result of the test set.

2. The method for constructing the deep network critical path based on the gate parameter iterative optimization as claimed in claim 1, wherein: in the first step, the image is a radar image, a foreground image is firstly extracted from the acquired radar image, then Gaussian blur processing is performed, and then a feature vector is extracted, so that a data set W = (X, Y) is obtained.

3. The method for constructing the deep network critical path based on the gate parameter iterative optimization as claimed in claim 1, wherein: in step 302, a gate parameter φ is added _i,j Loss function of network model of

Where L (X, Y: θ) represents a cross entropy loss function, θ represents a parameter of the network model,

represents the sparse constraint of the gate parameters and λ represents the weight of the sparse constraint.

4. A door parameter based on as in claim 1The iterative optimization deep network key path construction method is characterized by comprising the following steps: in a step 401, in which the data is processed,

l denotes the loss function, δ L denotes the increment of the loss function L, δ φ _i,j Representing the door parameter phi _i,j The increment of (a) is increased by (b),

denotes the sample prediction result, and Y denotes the sample label.