CN110647990A

CN110647990A - A tailoring method of deep convolutional neural network model based on grey relational analysis

Info

Publication number: CN110647990A
Application number: CN201910884247.7A
Authority: CN
Inventors: 黄世青; 白瑞林; 李新
Original assignee: XINJE ELECTRONIC CO Ltd
Current assignee: XINJE ELECTRONIC CO Ltd
Priority date: 2019-09-18
Filing date: 2019-09-18
Publication date: 2020-01-03

Abstract

The invention discloses a method for cutting a deep convolutional neural network model based on grey relational analysis, comprising: performing data augmentation on target data to obtain more training data; Data is trained, and a set of model parameters that fit the training data is obtained as an experimental model for tailoring; gray correlation analysis is used to quantify the importance of each convolution kernel in the experimental model, and the importance of each convolution kernel is obtained. based on the quantized value of the importance of the convolution kernel, the importance of all convolutions is obtained, and the least important convolution kernel is used as the target convolution kernel; the target convolution kernel and the target convolution kernel are The convolution kernel of the next layer related to the convolution kernel is repeatedly cropped until the stopping condition is satisfied. It realizes the advantages of accurately finding the convolution kernel that has the least impact on the accuracy after being trimmed, increasing the model trimming ratio while ensuring the accuracy, and speeding up the inference operation speed of the new model after trimming.

Description

A tailoring method of deep convolutional neural network model based on grey relational analysis

技术领域technical field

本发明涉及神经网络领域，具体地，涉及一种基于灰色关联分析的深度卷积神经网络模型的裁剪方法。The invention relates to the field of neural networks, in particular to a method for cutting a deep convolutional neural network model based on grey relational analysis.

背景技术Background technique

卷积神经网络在图像分类、目标检测和图像分割等方面取得了令人瞩目的理论与技术突破和更能让市场接受的识别准确率。但卷积神经网络巨大的计算量与存储量，导致难以应用在计算能力和存储空间有限的嵌入式终端设备。因此，对模型结构进行裁剪，加速模型推理速度，减小模型存储量，对卷积神经网络普及应用具有重要意义。目前，已有设计紧凑模型网络、模型蒸馏、低秩分解、模型量化、模型裁剪等各种提高神经网络运算速度的技术。作为一种实现方便、精度保持性好、加速效果明显的运算速度提高方法，模型裁剪技术得到了越来越多的关注。Convolutional neural networks have made remarkable theoretical and technological breakthroughs in image classification, target detection and image segmentation, and have more market-acceptable recognition accuracy. However, the huge amount of computation and storage of convolutional neural networks makes it difficult to apply to embedded terminal devices with limited computing power and storage space. Therefore, tailoring the model structure, accelerating the model inference speed, and reducing the model storage capacity are of great significance for the popularization and application of convolutional neural networks. At present, there are various techniques to improve the operation speed of neural networks, such as designing compact model networks, model distillation, low-rank decomposition, model quantization, and model tailoring. As a method for improving computing speed with convenient implementation, good accuracy retention and obvious acceleration effect, model clipping technology has received more and more attention.

在模型裁剪方法中，关键部分在于评价卷积核重要性，能否准确找出裁剪后对结果影响最小的卷积核决定了能否保持裁剪后模型的精度，也决定了裁剪算法能达到最大的推理速度提升倍数和体积压缩倍数。In the model cropping method, the key part is to evaluate the importance of the convolution kernel. Whether the convolution kernel that has the least impact on the result after cropping can be accurately found determines whether the accuracy of the cropped model can be maintained, and also determines whether the cropping algorithm can achieve the maximum The inference speed is improved by multiples and the volume is compressed by multiples.

卷积核评价方法可以根据评价对象分为数据驱动型和参数驱动型两类。Convolution kernel evaluation methods can be divided into two categories: data-driven and parameter-driven according to the evaluation object.

参数驱动型方法可以在模型裁剪过程中时间消耗较小，但是对模型的精度影响比较大，无法达到很高的速度提升。参数驱动型方法直接考察模型参数W，根据各通道参数值和的大小或者参数值是否大于阈值评价该卷积核通道的重要性。由于评价重要性的时候只需要遍历访问一次卷积核参数W并进行简单的求和计算即可，不需要进行额外的计算过程和重复过程，所以在裁剪的过程所需要花费的时间较少。The parameter-driven method can consume less time in the process of model trimming, but it has a relatively large impact on the accuracy of the model and cannot achieve a high speed improvement. The parameter-driven method directly examines the model parameter W, and evaluates the importance of the convolution kernel channel according to the sum of the parameter values of each channel or whether the parameter value is greater than the threshold. Since it is only necessary to traverse and visit the convolution kernel parameter W once and perform a simple summation calculation when evaluating the importance, no additional calculation process and repeated process are required, so the cutting process takes less time.

数据驱动型方法能够在进行大比例裁剪后使得模型的精度仍然保持得较好。数据驱动型方法需要利用训练集数据和模型参数得到每一层网络的激活值，根据激活值的一些数据特性评价卷积核的重要性。由于需要把每张图片逐一输入到网络中得到的各层的激活值，因此计算量相对较大，在裁剪过程中计算时间相对较长。但是由于大量训练集图片产生的激活值包含更多的信息，因此根据激活值来选择对应的卷积核作为裁剪目标可以更加准确找出最冗余的卷积核通道。Data-driven methods can keep the accuracy of the model better after large-scale cropping. The data-driven method needs to use the training set data and model parameters to obtain the activation value of each layer of the network, and evaluate the importance of the convolution kernel according to some data characteristics of the activation value. Since each picture needs to be input into the network one by one to obtain the activation values of each layer, the amount of calculation is relatively large, and the calculation time is relatively long during the cropping process. However, since the activation values generated by a large number of training set images contain more information, selecting the corresponding convolution kernel as the cropping target according to the activation value can more accurately find the most redundant convolution kernel channels.

不同的卷积核评价方法需要根据不同对象的数据信息特征进行排序，对象所包含的信息越简单，裁剪过程中计算量越少，处理时间短，但裁剪后对模型精度影响较大，无法达到很大的裁剪比例；对象包含的信息越丰富，裁剪过程中计算量越大，处理时间长，但裁剪后对模型精度影响较小，可以达到很大的裁剪比例。好的卷积核评价方法需要在保证精度下，尽可能多的对卷积核进行裁剪，提高模型在推理阶段的运行速度，从而保证在嵌入式平台的实时性需要。Different convolution kernel evaluation methods need to be sorted according to the data information characteristics of different objects. The simpler the information contained in the object, the less computation is required during the cutting process, and the processing time is short, but after cutting, it has a great impact on the model accuracy and cannot be achieved. Large cropping ratio; the richer the information contained in the object, the greater the amount of calculation and the long processing time during the cropping process, but the impact on the model accuracy after cropping is small, and a large cropping ratio can be achieved. A good convolution kernel evaluation method needs to cut as many convolution kernels as possible under the guarantee of accuracy, so as to improve the running speed of the model in the inference stage, so as to ensure the real-time needs of the embedded platform.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于，针对上述问题，提出一种基于灰色关联分析的深度卷积神经网络模型的裁剪方法，以实现准确找出被裁剪后对精度影响最小的卷积核，在保证精度的情况下提高模型裁剪比例，加快裁剪后新模型推理运算速度的优点。The purpose of the present invention is to, in view of the above problems, to propose a cutting method of a deep convolutional neural network model based on grey relational analysis, so as to accurately find the convolution kernel that has the least impact on the accuracy after being cut, and in the case of ensuring the accuracy It has the advantages of increasing the model clipping ratio and speeding up the inference operation speed of the new model after clipping.

为实现上述目的，本发明实施例采用的技术方案是：To achieve the above purpose, the technical solution adopted in the embodiment of the present invention is:

一种基于灰色关联分析的深度卷积神经网络模型的裁剪方法，包括：A method for tailoring a deep convolutional neural network model based on grey relational analysis, comprising:

对目标数据进行数据扩增以获得更多的训练数据；Perform data augmentation on the target data to obtain more training data;

将未经训练的初始网络模型利用所述训练数据进行训练，获得一组拟合所述训练数据的模型参数作为进行裁剪的实验模型；The untrained initial network model is trained using the training data, and a set of model parameters fitting the training data is obtained as an experimental model for tailoring;

利用灰色关联分析对所述实验模型中每个卷积核进行重要性的量化，得到每个卷积核的重要性的量化值；Use gray correlation analysis to quantify the importance of each convolution kernel in the experimental model, and obtain the quantified value of the importance of each convolution kernel;

基于所述卷积核的重要性的量化值得到所有卷积的重要性，并将最不重要的卷积核作为目标卷积核；The importance of all convolutions is obtained based on the quantized value of the importance of the convolution kernel, and the least important convolution kernel is used as the target convolution kernel;

对所述目标卷积核及与所述目标卷积核相关的下一层卷积核进行重复裁剪，直至满足停止条件。The target convolution kernel and the next layer of convolution kernels related to the target convolution kernel are repeatedly cropped until the stopping condition is satisfied.

进一步的，所述目标数据为图片数据。Further, the target data is picture data.

进一步的，所述数据扩增，包括：水平翻转或明亮度微调。Further, the data augmentation includes: horizontal flipping or brightness fine-tuning.

进一步的，将未经训练的初始网络模型利用所述训练数据进行训练，为：Further, the untrained initial network model is trained using the training data, as follows:

将未经训练的初始网络模型利用所述训练数据利用随机梯度下降法进行训练，使得损失函数值达到全局最低点。The untrained initial network model is trained by using the stochastic gradient descent method with the training data, so that the value of the loss function reaches the global minimum.

进一步的，所述停止条件为浮点运算数FLOPs；Further, the stop condition is floating-point operand FLOPs;

其中，L是神经网络的总层数，i神经网络层的序号，h和w和c是当前层的输入特征图的高和宽和深度，n是输出特征图的深度，k是卷积核的尺寸。Among them, L is the total number of layers of the neural network, i is the serial number of the neural network layer, h, w and c are the height, width and depth of the input feature map of the current layer, n is the depth of the output feature map, and k is the convolution kernel size of.

进一步的，所述利用灰色关联分析对所述实验模型中每个卷积核进行重要性的量化，得到每个卷积核的重要性的量化值，包括：Further, the gray correlation analysis is used to quantify the importance of each convolution kernel in the experimental model, and the quantified value of the importance of each convolution kernel is obtained, including:

将所述实验模型中每个卷积核的特征层通过全局平均池化转变为二维矩阵；Converting the feature layer of each convolution kernel in the experimental model into a two-dimensional matrix through global average pooling;

将所有卷积核特征层全局平均池化转变的二维矩阵合并为分析矩阵；Combine the two-dimensional matrix transformed by the global average pooling of all convolution kernel feature layers into an analysis matrix;

在所述分析矩阵内选择参考序列；selecting a reference sequence within the analysis matrix;

分别计算所述分析矩阵的比较序列和所述参考序列的关联系数；Calculate the correlation coefficient of the comparison sequence of the analysis matrix and the reference sequence, respectively;

基于所述关联系数计算所述参考序列的重要性；calculating the importance of the reference sequence based on the correlation coefficient;

基于所述参考序列的重要性，获取卷积核通道重要性量化值。Based on the importance of the reference sequence, a convolution kernel channel importance quantization value is obtained.

进一步的，所述基于所述参考序列的重要性，获取卷积核通道重要性量化值，包括：Further, obtaining the convolution kernel channel importance quantization value based on the importance of the reference sequence, including:

基于所述参考序列的重要性，添加关于层数的正则化函数；adding a regularization function on the number of layers based on the importance of the reference sequence;

基于所述正则化函数得到与层数无关的卷积核通道重要性量化值

Based on the regularization function, the convolution kernel channel importance quantization value independent of the number of layers is obtained

进一步的，所述关联系数为：Further, the correlation coefficient is:

其中，

代表第k个比较序列与参考序列之间的关联度，ρ是分辨系数，

和

为分析矩阵的元素。in,

represents the degree of association between the k-th comparison sequence and the reference sequence, ρ is the resolution coefficient,

and

is the element of the analysis matrix.

进一步的，所述基于所述关联系数计算所述参考序列的重要性，包括：Further, the calculating the importance of the reference sequence based on the correlation coefficient includes:

基于所述关联系数获取比较序列与参考序列之间的平均关联度，如所述平均关联度越高，则所述参考序列所代表的卷积核通道所提取的特征与其余通道提取的特征越相似，即所述参考序列所代表的卷积核通道重要性越低。The average correlation degree between the comparison sequence and the reference sequence is obtained based on the correlation coefficient. If the average correlation degree is higher, the features extracted by the convolution kernel channel represented by the reference sequence and the features extracted by the other channels are more Similar, that is, the convolution kernel channel represented by the reference sequence is less important.

进一步的，所述

Further, the said

其中i代表层数，j代表卷积核通道的序号，代表第j个卷积核通道的重要性。where i represents the number of layers, j represents the serial number of the convolution kernel channel, represents the importance of the jth convolution kernel channel.

本实施例具有以下有益效果：This embodiment has the following beneficial effects:

本实施例，利用灰色关联分析对所述实验模型中每个卷积核进行重要性的量化，从而根据量化值对每个卷积核的重要性进行判断，并根据重要性选择裁剪的目标卷积核，对目标卷积核及相关卷积核进行裁剪，从而达到了准确找出被裁剪后对精度影响最小的卷积核，在保证精度的情况下提高模型裁剪比例，加快裁剪后新模型推理运算速度的优点。In this embodiment, gray correlation analysis is used to quantify the importance of each convolution kernel in the experimental model, so that the importance of each convolution kernel is judged according to the quantized value, and the target volume to be cut is selected according to the importance Accumulate the kernel, trim the target convolution kernel and related convolution kernels, so as to accurately find the convolution kernel that has the least impact on the accuracy after being trimmed, increase the model trimming ratio while ensuring the accuracy, and speed up the new model after trimming. The advantage of inference operation speed.

本实施的方法采用灰色关联分析评价卷积核重要性，裁剪掉对结果贡献小的卷积核从而减小计算量提高推理速度；所有层的卷积核都需要被评价，并且不同层之间的卷积核重要性可以进行互相比较，免除了裁剪前对各层预先设置裁剪比例；通过模型裁剪前后的FLOTs比值满足要求作为裁剪停止的信号，可以在速度提升效果达到要求后及时停止裁剪，避免精度过多的下降；具有普适性，能广泛应用于目前常见的各种网络及其变体；相比于普通的数据驱动型评价方法，本实施例采用的灰色关联分析法作为一种量化衡量因素间关联性的方法，能够准确在多个因素中找出两个关系最密切的因素，适用于寻找卷积核之间的关联性作为裁剪依据，使得即使大比例裁剪了多数卷积核仍然能保持模型精度基本不变；在保证精度的同时提高了裁剪的百分比，从加快模型在推理阶段的运行速度。The method of this implementation uses grey relational analysis to evaluate the importance of the convolution kernel, and cuts out the convolution kernel that contributes little to the result, thereby reducing the amount of computation and improving the inference speed; the convolution kernels of all layers need to be evaluated, and the difference between The importance of the convolution kernels can be compared with each other, eliminating the need to pre-set the clipping ratio for each layer before clipping; the FLOTs ratio before and after model clipping meets the requirements as the signal for clipping stop, and the clipping can be stopped in time after the speed improvement effect meets the requirements. Avoid excessive drop in accuracy; it is universal and can be widely used in various networks and their variants that are currently common; The method of quantitatively measuring the correlation between factors can accurately find the two most closely related factors among multiple factors, and is suitable for finding the correlation between convolution kernels as a cutting basis, so that even if a large proportion of most convolutions are cut out The kernel can still keep the model accuracy basically unchanged; while maintaining the accuracy, the percentage of clipping is increased to speed up the running speed of the model in the inference phase.

下面通过附图和实施例，对本发明的技术方案做进一步的详细描述。The technical solutions of the present invention will be further described in detail below through the accompanying drawings and embodiments.

附图说明Description of drawings

图1为本发明实施例所述的基于灰色关联分析的深度卷积神经网络模型的裁剪方法流程图；1 is a flowchart of a method for cutting a deep convolutional neural network model based on grey relational analysis according to an embodiment of the present invention;

图2为本发明实施例所述的基于灰色关联分析的深度卷积神经网络模型的裁剪方法的原理框图；2 is a schematic block diagram of a method for tailoring a deep convolutional neural network model based on grey relational analysis according to an embodiment of the present invention;

图3为本发明实施例所述的基于灰色关联分析的深度卷积神经网络模型的裁剪方法的当前层卷积核通道与下一层相关通道裁剪示意图；3 is a schematic diagram of a current layer convolution kernel channel and a next layer related channel clipping of a method for clipping a deep convolutional neural network model based on gray correlation analysis according to an embodiment of the present invention;

图4为本发明实施例所述的基于灰色关联分析的深度卷积神经网络模型的裁剪方法的残差网络裁剪示意图；4 is a schematic diagram of residual network trimming of a trimming method for a deep convolutional neural network model based on grey relational analysis according to an embodiment of the present invention;

图5为本发明实施例所述的基于灰色关联分析的深度卷积神经网络模型的裁剪方法的卷积核重要性评价流程图。FIG. 5 is a flowchart for evaluating the importance of convolution kernels in the method for trimming a deep convolutional neural network model based on grey relational analysis according to an embodiment of the present invention.

具体实施方式Detailed ways

以下结合附图对本发明的优选实施例进行说明，应当理解，此处所描述的优选实施例仅用于说明和解释本发明，并不用于限定本发明。The preferred embodiments of the present invention will be described below with reference to the accompanying drawings. It should be understood that the preferred embodiments described herein are only used to illustrate and explain the present invention, but not to limit the present invention.

如图1所示，一种基于灰色关联分析的深度卷积神经网络模型的裁剪方法，包括：As shown in Figure 1, a method for tailoring a deep convolutional neural network model based on grey relational analysis includes:

S101：对目标数据进行数据扩增以获得更多的训练数据；S101: perform data augmentation on the target data to obtain more training data;

S102：将未经训练的初始网络模型利用所述训练数据进行训练，获得一组拟合所述训练数据的模型参数作为进行裁剪的实验模型；S102: Use the training data to train the untrained initial network model, and obtain a set of model parameters that fit the training data as an experimental model for tailoring;

S103：利用灰色关联分析对所述实验模型中每个卷积核进行重要性的量化，得到每个卷积核的重要性的量化值；S103: quantify the importance of each convolution kernel in the experimental model by using grey correlation analysis to obtain a quantified value of the importance of each convolution kernel;

S104：基于所述卷积核的重要性的量化值得到所有卷积的重要性，并将最不重要的卷积核作为目标卷积核；S104: Obtain the importance of all convolutions based on the quantized value of the importance of the convolution kernel, and use the least important convolution kernel as the target convolution kernel;

S105：对所述目标卷积核及与所述目标卷积核相关的下一层卷积核进行重复裁剪，直至满足停止条件。S105: Repeat trimming of the target convolution kernel and the next layer of convolution kernels related to the target convolution kernel until a stopping condition is satisfied.

具体的实施方式中，所述目标数据可以为图片数据。数据扩增，包括：水平翻转或明亮度微调等。In a specific implementation manner, the target data may be picture data. Data augmentation, including: horizontal flip or brightness fine-tuning, etc.

具体的实施方式中，将未经训练的初始网络模型利用所述训练数据进行训练，为：In a specific embodiment, the untrained initial network model is trained using the training data, as follows:

具体的实施方式中，所述停止条件为浮点运算数FLOPs；In a specific implementation, the stop condition is floating-point operand FLOPs;

具体的实施方式中，所述利用灰色关联分析对所述实验模型中每个卷积核进行重要性的量化，得到每个卷积核的重要性的量化值，包括：In a specific embodiment, the gray correlation analysis is used to quantify the importance of each convolution kernel in the experimental model, and the quantified value of the importance of each convolution kernel is obtained, including:

具体的实施方式中，所述基于所述参考序列的重要性，获取卷积核通道重要性量化值，包括：In a specific embodiment, the obtaining of the convolution kernel channel importance quantization value based on the importance of the reference sequence includes:

进一步的，所述关联系数为：Further, the correlation coefficient is:

其中，

和

为分析矩阵的元素。in,

and

is the element of the analysis matrix.

具体的实施方式中，所述基于所述关联系数计算所述参考序列的重要性，包括：In a specific embodiment, the calculating the importance of the reference sequence based on the correlation coefficient includes:

具体的实施方式中，所述

In specific embodiments, the

其中i代表层数，j代表卷积核通道的序号，

代表第j个卷积核通道的重要性。where i represents the number of layers, j represents the serial number of the convolution kernel channel,

represents the importance of the jth convolution kernel channel.

在一个具体的应用场景中，In a specific application scenario,

本实施例通过改进数据驱动型的模型裁剪方法，采用灰色关联分析对卷积核通道重要性进行量化，删除不重要的卷积核通道从而实现减小计算量加快运算速度，整个算法流程主要由数据扩增、预训练、评价卷积核、裁剪停止条件等构成。如图2所示。In this embodiment, the data-driven model clipping method is improved, and gray correlation analysis is used to quantify the importance of the convolution kernel channels, and the unimportant convolution kernel channels are deleted to reduce the amount of calculation and speed up the operation. The entire algorithm flow is mainly composed of Data augmentation, pre-training, evaluation of convolution kernels, clipping stop conditions, etc. as shown in picture 2.

具体实现步骤为：The specific implementation steps are:

(1)对图片数据进行包括随机水平翻转、随机明亮度微调在内的数据扩增以获得更多的训练数据，使模型能够更好的拟合各种环境条件。水平翻转表示为：(1) Data augmentation, including random horizontal flipping and random brightness fine-tuning, is performed on the image data to obtain more training data, so that the model can better fit various environmental conditions. Horizontal flip is represented as:

I(a,b)＝I(Weight-a,b)，I(a,b)=I(Weight-a,b),

其中I(a,b)代表图像，Weight是图像的长度。where I(a,b) represents the image and Weight is the length of the image.

(2)将初始网络模型利用步骤(1)产生的扩增数据进行训练获得一组拟合训练数据的模型参数作为实验模型。使用随机梯度下降法(SGD)来训练模型参数，参数更新方法为：(2) The initial network model is trained using the augmented data generated in step (1) to obtain a set of model parameters that fit the training data as an experimental model. Use stochastic gradient descent (SGD) to train model parameters, and the parameter update method is:

其中w_j代表模型参数W中第j个参数，α为学习率，m为批大小，x⁽ⁱ⁾和y⁽ⁱ⁾分别为训练集的图片和标签。where w _j represents the jth parameter in the model parameter W, α is the learning rate, m is the batch size, and x ⁽ⁱ⁾ and y ⁽ⁱ⁾ are the images and labels of the training set, respectively.

(3)利用灰色关联分析对步骤(2)获得的实验模型中每个卷积核进行重要性的量化表示：首先使用全局平均池化将原本形状是h_i×w_i×c_i的第i层特征图

转变成形状是1×n_i的特征向量V_i ^l；将训练集中m张图片逐一输入网络得到形状为m×n_i的张量T_i，将m作为灰色关联分析的序列数，n_i作为因素个数；将第一个卷积核通道的特征作为参考序列。逐个计算参考序列和比较序列对应元素的绝对差值，分别计算每个比较序列和参考序列的关联系数；重复上述步骤，将第二个卷积核通道的特征作为参考序列计算其重要性，以此类推获得所有卷积核通道的重要性；经过正则化后最终得到一个与层数无关的卷积核通道重要性量化值。如图5所示。(3) Use grey relational analysis to quantify the importance of each convolution kernel in the experimental model obtained in step (2): first, use global average pooling to convert the i-th shape of the original shape h _i ×w _i ×ci _i Layer Feature Map

Transform into a feature vector V _i ^l with a shape of 1×n _i ; input m pictures in the training set into the network one by one to obtain a tensor T _i with a shape of m×n _i , take m as the sequence number of grey relational analysis, and n _i as The number of factors; the feature of the first convolution kernel channel is used as the reference sequence. Calculate the absolute difference between the corresponding elements of the reference sequence and the comparison sequence one by one, and calculate the correlation coefficient of each comparison sequence and the reference sequence respectively; repeat the above steps, and use the feature of the second convolution kernel channel as the reference sequence to calculate its importance, with The importance of all convolution kernel channels is obtained by analogy; after regularization, a convolution kernel channel importance quantization value independent of the number of layers is finally obtained. As shown in Figure 5.

(3.1)全局平均池化：(3.1) Global average pooling:

每一层的输出特征都是三维矩阵，不方便直接对其进行处理和分析。通过全局平均池化操作将特征层转变为二维矩阵。具体过程如下：The output feature of each layer is a three-dimensional matrix, which is not convenient to process and analyze directly. The feature layer is transformed into a 2D matrix by a global average pooling operation. The specific process is as follows:

若第l张图片的第i层输出特征为则全局平均池化过程为：If the output feature of the i-th layer of the l-th picture is Then the global average pooling process is:

其中

为池化后的特征向量。in

is the feature vector after pooling.

(3.2)获得分析矩阵：(3.2) Obtain the analysis matrix:

为了能够准确获得每个卷积核通道的重要性，需要根据整个数据集共m张图片的信息来进行分析。即对于每一层输出特征，需要将m张图片都送入卷积神经网络获得在该层的输出特征并进行全局平均池化，获得m个

最后向量合并为矩阵T_i：In order to accurately obtain the importance of each convolution kernel channel, it needs to be analyzed according to the information of m pictures in the entire dataset. That is, for the output features of each layer, it is necessary to send m pictures into the convolutional neural network to obtain the output features of this layer and perform global average pooling to obtain m pictures.

Finally the vectors are merged into a matrix T _i :

其中

是向量V_i ¹的第一个向量值。矩阵T_i作为分析矩阵，将m作为分析的序列数，n_i作为分析的因素个数。in

is the first vector value of vector V _i ¹ . The matrix T _i is used as the analysis matrix, m is the number of sequences analyzed, and n _i is the number of factors analyzed.

(3.3)选择参考序列：(3.3) Select the reference sequence:

每次计算前需要选择一个因素(卷积核通道)作为参考序列，计算结果为其他因素与参考因素的平均关联度，因此整个计算过程需要依次选择每个因素作为参考因素。当将第n_i个卷积核作为参考因素时，需要将矩阵T_i的第n_i行放到第一行，得到

为：Before each calculation, one factor (convolution kernel channel) needs to be selected as the reference sequence, and the calculation result is the average correlation between other factors and the reference factor, so the whole calculation process needs to select each factor as the reference factor in turn. When the n _i th convolution kernel is used as a reference factor, the n _{i th} row of the matrix T _i needs to be placed in the first row to get

for:

(3.4)计算关联系数：(3.4) Calculate the correlation coefficient:

分别计算每个比较序列和参考序列的关联系数为：Calculate the correlation coefficient for each compared sequence and the reference sequence separately as:

其中

代表第k个比较序列与参考序列之间的关联度，ρ是分辨系数。in

represents the degree of association between the k-th comparison sequence and the reference sequence, and ρ is the resolution coefficient.

(3.5)计算参考序列重要性：(3.5) Calculate the reference sequence importance:

比较序列与参考序列之间的平均关联度越高，就认为该参考序列所代表的卷积核通道所提取的特征与其余通道提取的特征越相似，即该卷积核通道重要性越低：The higher the average degree of correlation between the comparison sequence and the reference sequence, the more similar the features extracted by the convolution kernel channel represented by the reference sequence to the features extracted by the rest of the channels, that is, the lower the importance of the convolution kernel channel:

其中

代表第j个卷积核通道的重要性，其重要程度与值的大小成反比。in

Represents the importance of the jth convolution kernel channel, and its importance is inversely proportional to the magnitude of the value.

(3.6)计算卷积核通道重要性：(3.6) Calculate the importance of the convolution kernel channel:

不同层之间的通道由于输入分布不一样导致特征值的范围不一样，所以在比较不同层的重要性的时候需要添加了一个关于层数的L2正则化函数。最终得到一个与层数无关的卷积核通道重要性量化值

其中i代表层数，j代表卷积核通道的序号：The channels between different layers have different ranges of eigenvalues due to different input distributions, so when comparing the importance of different layers, an L2 regularization function about the number of layers needs to be added. Finally, a convolution kernel channel importance quantization value independent of the number of layers is obtained

where i represents the number of layers, and j represents the serial number of the convolution kernel channel:

(4)对所有卷积核根据计算获得重要性的量化值，将最不重要的卷积核作为裁剪对象；将目标卷积核及与之相关的下一层卷积核对应通道裁剪后进行训练以弥补性能的损失；重复操作直到最后满足裁剪停止条件；最后再进行训练恢复裁剪过程中模型的精度损失。如图3所示。(4) Obtain the quantized value of importance for all convolution kernels according to the calculation, and take the least important convolution kernel as the clipping object; clip the corresponding channel of the target convolution kernel and the convolution kernel of the next layer related to it. Train to make up for the loss of performance; repeat the operation until the clipping stop condition is finally met; and finally train to recover the accuracy loss of the model during clipping. As shown in Figure 3.

(5)计算速度的快慢与计算量的大小成正比。计算量的指标为浮点运算数，卷积操作的卷积核通道越多，输入特征图越大，则浮点运算数越大。本发明通过裁剪卷积核通道数的方法使得计算量变小，加快推理阶段的运算速度。裁剪过程的停止条件为，新模型和初始模型的FLOPs比值满足运算速度的提升要求。(5) The speed of calculation is proportional to the amount of calculation. The indicator of the calculation amount is the number of floating-point operations. The more convolution kernel channels in the convolution operation and the larger the input feature map, the larger the number of floating-point operations. The invention reduces the calculation amount by cutting the number of channels of the convolution kernel, and speeds up the operation speed of the inference stage. The stopping condition of the trimming process is that the ratio of FLOPs between the new model and the initial model meets the requirement of improving the operation speed.

图4中实线框代表卷积操作。框内的参数分别代表卷积核宽、卷积核高、输入通道数、卷积核个数。残差块分两种，第一种跳跃连接过程不带卷积操作，如图中虚线框B、D所示，该残差块只裁剪第一个和第二个卷积过程，第三个卷积过程不裁是为了保证跳跃连接前后通道一致。第二种残差块跳跃连接过程带卷积操作，如图中虚线框A、C所示，理论上可以裁剪第三个卷积层，只要同时改变跳跃连接的卷积核数量即可，但是这样会改变下一个残差块的输入维度，所以两种残差块都只裁剪2个卷积层。The solid-line box in Figure 4 represents the convolution operation. The parameters in the box represent the width of the convolution kernel, the height of the convolution kernel, the number of input channels, and the number of convolution kernels. There are two types of residual blocks. The first skip connection process does not have a convolution operation, as shown in the dotted boxes B and D in the figure. The residual block only cuts the first and second convolution processes, and the third The convolution process is not cut to ensure that the channels before and after the skip connection are consistent. The second residual block skip connection process has convolution operation, as shown in the dotted boxes A and C in the figure, the third convolution layer can theoretically be trimmed, as long as the number of convolution kernels of the skip connection can be changed at the same time, but This changes the input dimension of the next residual block, so both residual blocks only crop 2 convolutional layers.

最后应说明的是：以上所述仅为本发明的优选实施例而已，并不用于限制本发明，尽管参照前述实施例对本发明进行了详细的说明，对于本领域的技术人员来说，其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换。凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。Finally, it should be noted that the above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Although the present invention has been described in detail with reference to the foregoing embodiments, for those skilled in the art, the The technical solutions described in the foregoing embodiments may be modified, or some technical features thereof may be equivalently replaced. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included within the protection scope of the present invention.

Claims

1. a cutting method based on the deep convolutional neural network model of grey relational analysis, is characterized in that, comprises:

Perform data augmentation on the target data to obtain more training data;

The untrained initial network model is trained using the training data, and a set of model parameters fitting the training data is obtained as an experimental model for tailoring;

Use gray correlation analysis to quantify the importance of each convolution kernel in the experimental model, and obtain the quantified value of the importance of each convolution kernel;

The importance of all convolutions is obtained based on the quantized value of the importance of the convolution kernel, and the least important convolution kernel is used as the target convolution kernel;

The target convolution kernel and the next layer of convolution kernels related to the target convolution kernel are repeatedly cropped until the stopping condition is satisfied.

2 . The method for cropping a deep convolutional neural network model based on grey relational analysis according to claim 1 , wherein the target data is picture data. 3 .

3 . The method for cropping a deep convolutional neural network model based on grey relational analysis according to claim 2 , wherein the data augmentation comprises: horizontal flipping or brightness fine-tuning. 4 .

4. the clipping method of the deep convolutional neural network model based on grey relational analysis according to claim 1, is characterized in that, the initial network model without training utilizes described training data to carry out training, is:

The untrained initial network model is trained by using the stochastic gradient descent method with the training data, so that the value of the loss function reaches the global minimum.

5. the clipping method of the deep convolutional neural network model based on grey relational analysis according to claim 1, is characterized in that, described stop condition is floating-point operand FLOPs;

Among them, L is the total number of layers of the neural network, i is the serial number of the neural network layer, h, w and c are the height, width and depth of the input feature map of the current layer, n is the depth of the output feature map, and k is the convolution kernel size of.

6. the clipping method of the deep convolutional neural network model based on grey relational analysis according to claim 1, is characterized in that, described utilizes grey relational analysis to carry out the quantification of importance to each convolution kernel in described experimental model , get the quantized value of the importance of each convolution kernel, including:

Converting the feature layer of each convolution kernel in the experimental model into a two-dimensional matrix through global average pooling;

Combine the two-dimensional matrix transformed by the global average pooling of all convolution kernel feature layers into an analysis matrix;

selecting a reference sequence within the analysis matrix;

Calculate the correlation coefficient of the comparison sequence of the analysis matrix and the reference sequence, respectively;

calculating the importance of the reference sequence based on the correlation coefficient;

Based on the importance of the reference sequence, a convolution kernel channel importance quantization value is obtained.

7. The cutting method of the deep convolutional neural network model based on grey relational analysis according to claim 6, wherein the described importance based on the reference sequence, obtains the convolution kernel channel importance quantization value, including :

adding a regularization function on the number of layers based on the importance of the reference sequence;

8. the clipping method of the deep convolutional neural network model based on grey relational analysis according to claim 6 or 7, is characterized in that,

The correlation coefficient is:

in,

and

is the element of the analysis matrix.

9. The method for tailoring a deep convolutional neural network model based on grey relational analysis according to claim 8, wherein the calculating the importance of the reference sequence based on the relational coefficient comprises:

The average correlation degree between the comparison sequence and the reference sequence is obtained based on the correlation coefficient. If the average correlation degree is higher, the features extracted by the convolution kernel channel represented by the reference sequence and the features extracted by the other channels are more Similar, that is, the convolution kernel channel represented by the reference sequence is less important.

10. The cutting method of the deep convolutional neural network model based on grey relational analysis according to claim 9, wherein the

where i represents the number of layers, j represents the serial number of the convolution kernel channel,

represents the importance of the jth convolution kernel channel.