CN107679539B

CN107679539B - A Method for Integrating Local Information and Global Information of Single Convolutional Neural Network Based on Local Receptive Field

Info

Publication number: CN107679539B
Application number: CN201710842145.XA
Authority: CN
Inventors: 文戈; 蔡登�; 何晓飞
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2017-09-18
Filing date: 2017-09-18
Publication date: 2019-12-10
Anticipated expiration: 2037-09-18
Also published as: CN107679539A

Abstract

The invention discloses a method for integrating local information and global information of a single convolution neural network based on local perception fields, which comprises the following steps: step 1, a convolutional neural network model is given, and the size of a local perception field of each layer of feature map in an original image is calculated; step 2, selecting a layer to be divided according to the size of the local perception field of each layer so as to balance local information and global information; step 3, comprehensively considering the information amount and the calculated amount, selecting a segmentation mode and the segmentation number, and segmenting the feature map of the selected layer; step 4, carrying out dimension matching on the segmented feature map; and step 5, superposing all layers behind the selected layer, including the loss function layer, to each segmented sub-feature map, and then training. The convolutional neural network reconstruction algorithm provided by the invention can enable local information and global information to be obtained by simultaneously learning in one network model, improves the expression capability of the network model, and only generates less calculation amount increase.

Description

Local information and global information of a single convolutional neural network based on local perceptual field Integration method

技术领域technical field

本发明涉及图像处理技术领域，具体涉及一种基于局部感知野的单卷积神经网络局部信息与全局信息整合方法。The invention relates to the technical field of image processing, in particular to a method for integrating local information and global information of a single convolutional neural network based on a local perceptual field.

背景技术Background technique

2012年，卷积神经网络在ImageNet大规模图像识别比赛中获得冠军，并相比传统算法取得了压倒性的优势。近年来，卷积神经网络越来越多地被应用到计算机视觉的各个领域，包括图像识别、图像检测、图像分割等。In 2012, the convolutional neural network won the championship in the ImageNet large-scale image recognition competition, and achieved an overwhelming advantage over traditional algorithms. In recent years, convolutional neural networks have been increasingly applied to various fields of computer vision, including image recognition, image detection, image segmentation, etc.

卷积神经网络是由若干卷积层、池化层以及损失函数层等叠加而成的多层神经网络模型。当一张图片输入至卷积神经网络，随着深度的增加，每一层特征图中一个固定大小的区域对应在原图中的局部感知野大小也相应增大。Convolutional neural network is a multi-layer neural network model composed of several convolutional layers, pooling layers, and loss function layers. When a picture is input to the convolutional neural network, as the depth increases, a fixed-size region in each layer of feature maps corresponds to a corresponding increase in the size of the local perceptual field in the original image.

由于卷积神经网络的结构特性，全局信息能被很好地利用，而局部细节信息通常不易学习得到。为了同时整合局部信息与全局信息，当前，研究者倾向于对原图抽取多个子块，然后分别对每一个图像子块训练一个单独的卷积神经网络模型。在测试阶段，每一张图像及其子块都需经过对应模型抽取特征，然后将所有抽取得到的特征进行取平均或拼接操作，作为该图像的最终特征。但这样的方法存在诸多局限性。1.特征抽取时间会随着模型个数的增加线性增长，特别是测试阶段，过长的特征抽取时间会影响到模型部署后的性能；2.最终准确率的增长会随着模型个数的增加越来越小，通常从一个模型拼接到两个模型的准确率增长会比两个模型到十多个模型的准确率增长更大；3.子块的选择需要大量人工参与，拼接不合适的子块甚至会使最终准确率下降。Due to the structural characteristics of convolutional neural networks, global information can be well utilized, while local detail information is usually not easy to learn. In order to integrate local information and global information at the same time, researchers currently tend to extract multiple sub-blocks from the original image, and then train a separate convolutional neural network model for each image sub-block. In the test phase, each image and its sub-blocks need to go through the corresponding model to extract features, and then average or stitch all the extracted features as the final features of the image. But this method has many limitations. 1. The feature extraction time will increase linearly with the increase of the number of models, especially in the test phase, too long feature extraction time will affect the performance of the model after deployment; 2. The final accuracy rate will increase with the number of models The increase is getting smaller and smaller. Usually, the accuracy rate increase from one model to two models will be greater than the accuracy rate increase from two models to more than ten models; 3. The selection of sub-blocks requires a lot of manual participation, and the splicing is not suitable sub-blocks will even degrade the final accuracy.

发明内容Contents of the invention

针对上述现有通过分割原图利用多模型整合局部信息与全局信息方法的局限性，本发明提供了一种基于局部感知野的单卷积神经网络局部信息与全局信息整合方法，能使得在一个网络模型中同时学习得到局部信息与全局信息，提高网络模型表达能力，同时仅产生较少的计算量增加。Aiming at the limitations of the above-mentioned existing method of integrating local information and global information by using multiple models by dividing the original image, the present invention provides a method for integrating local information and global information of a single convolutional neural network based on the local perceptual field, which can enable The local information and global information are simultaneously learned in the network model, which improves the expressive ability of the network model, and at the same time only produces a small increase in the amount of calculation.

一种基于局部感知野的单卷积神经网络局部信息与全局信息整合方法，包括：A single convolutional neural network local information and global information integration method based on local perceptual field, including:

步骤1，给定卷积神经网络模型，计算每一层特征图对应在原图中的局部感知野大小；Step 1, given the convolutional neural network model, calculate the local perceptual field size corresponding to the feature map of each layer in the original image;

步骤2，依据每一层局部感知野大小，选取一个层分割，以平衡局部信息和全局信息；Step 2, according to the size of the local perceptual field of each layer, select a layer segmentation to balance local information and global information;

步骤3，综合考虑信息量与计算量，选取分割方式与分割个数，对选取层的特征图进行分割；Step 3, comprehensively considering the amount of information and the amount of calculation, selecting the segmentation method and the number of segmentations, and segmenting the feature map of the selected layer;

步骤4，对分割后的特征图进行维度匹配；Step 4, perform dimension matching on the segmented feature map;

步骤5，将选取层后面的所有层，包括损失函数层，叠加到每一个分割后的子特征图之后，进行训练，完成局部信息与全局信息的整合。In step 5, all layers behind the selected layer, including the loss function layer, are superimposed on each segmented sub-feature map, and trained to complete the integration of local information and global information.

为方便表述，记输入层即图像层为第0层，后续深度为i的层为第i层。For the convenience of expression, record the input layer, that is, the image layer, as layer 0, and the subsequent layer with depth i as layer i.

步骤1中，局部感知野大小的计算可以使用逐层迭代的方式：In step 1, the calculation of the size of the local perceptual field can use layer-by-layer iteration:

对于第N层特征图中一个大小为H_N*W_N的区域，可以依据第N层网络参数如核大小、步长等方便计算得到其对应到第N-1层特征图中的大小H_N-1*W_N-1，逐层迭代，直至第0层输入层，得到H₀*W₀，即为所求。For a region of size H _N * W _N in the feature map of the Nth layer, it can be conveniently calculated according to the network parameters of the Nth layer such as kernel size and step size to obtain the size H _N corresponding to the feature map of the N-1th layer _-1 *W _N-1 , iterate layer by layer until the input layer of the 0th layer, and obtain H ₀ *W ₀ , which is the desired result.

其中，H_N与W_N是第N层特征图所述区域的长度与宽度，H_N-1与W_N-1是第N-1层特征图所述区域的长度与宽度，H₀与W₀是第0层特征图所述区域的长度与宽度。Among them, H _N and W _N are the length and width of the region described in the feature map of the Nth layer, H _N-1 and W _N-1 are the length and width of the region described in the feature map of the N-1 layer, H ₀ and W ₀ is the length and width of the area described in the feature map of layer 0.

作为优选，步骤2中，本发明选择一个位于网络中间且后接池化层的卷积层。Preferably, in step 2, the present invention selects a convolutional layer located in the middle of the network and followed by a pooling layer.

步骤3中，可将原特征图均匀或依据关键点分割成合适个数的子特征图。In step 3, the original feature map can be divided into an appropriate number of sub-feature maps evenly or according to key points.

步骤4中，有多种方法可进行特征图放大操作，用于维度匹配，包括：线性插值、添加去卷积层等，对于特定结构的卷积神经网络模型，还可以选择去池化层。In step 4, there are many ways to enlarge the feature map for dimension matching, including: linear interpolation, adding a deconvolution layer, etc. For a convolutional neural network model with a specific structure, you can also choose a depooling layer.

本发明通过分析卷积神经网络模型中每一层特征图中一个固定大小的区域对应到原图中的局部感知野大小，选取合适的层进行分割及维度匹配，然后将后续层叠加到分割层之后，以达到在单个卷积神经网络中整合局部信息和全局信息的目的，具有以下优点：The present invention corresponds to the size of the local perception field in the original image by analyzing a fixed-size area in each layer feature map in the convolutional neural network model, selects a suitable layer for segmentation and dimension matching, and then superimposes subsequent layers on the segmentation layer After that, in order to achieve the purpose of integrating local information and global information in a single convolutional neural network, it has the following advantages:

(1)将局部信息与全局信息整合到单个卷积神经网络模型中；(1) Integrate local information and global information into a single convolutional neural network model;

(2)由于前几层保持不变，能保留全局信息，相比分割原图，能取得更好的效果；(2) Since the first few layers remain unchanged, the global information can be preserved, and better results can be achieved compared to splitting the original image;

(3)由于前几层保持不变，共享权重，相比分割原图，计算量增加更少。(3) Since the first few layers remain unchanged and share weights, the amount of calculation increases less than that of splitting the original image.

附图说明Description of drawings

图1为本发明基于局部感知野的单卷积神经网络局部信息与全局信息整合方法的流程图；Fig. 1 is the flowchart of the method for integrating local information and global information of a single convolutional neural network based on a local perceptual field in the present invention;

图2为本发明基于局部感知野的单卷积神经网络局部信息与全局信息整合方法的结构示意图。Fig. 2 is a schematic structural diagram of a method for integrating local information and global information of a single convolutional neural network based on a local perceptual field in the present invention.

具体实施方式Detailed ways

以下结合附图和实例，对本发明作进一步介绍。Below in conjunction with accompanying drawing and example, the present invention will be further introduced.

本发明提供的基于局部感知野的单卷积神经网络局部信息与全局信息整合方法，在Linux系统上基于深度学习框架Caffe进行系统实现，流程如图1所示，结构示意图如图2所示。表1所示是本发明实验中使用的卷积神经网络基础模型，其输入大小为100*100，由五组相似模块叠加而成，每一组模块由两个卷积层叠加一个池化层，或三个卷积层组成。表1最后一列展示了将当前层特征图均匀分割为2*2个特征图后，每一个分割后的特征图对应到原图中的局部感知野大小。以该模型作为实例，本发明具体步骤描述如下：The method for integrating local information and global information of a single convolutional neural network based on the local perceptual field provided by the present invention is implemented on a Linux system based on the deep learning framework Caffe. The process flow is shown in Figure 1, and the structural diagram is shown in Figure 2. Table 1 shows the basic model of the convolutional neural network used in the experiment of the present invention. Its input size is 100*100, and it is formed by superimposing five groups of similar modules. Each group of modules is composed of two convolutional layers superimposed with a pooling layer , or three convolutional layers. The last column of Table 1 shows that after the feature map of the current layer is evenly divided into 2*2 feature maps, each segmented feature map corresponds to the size of the local perceptual field in the original image. Taking this model as an example, the specific steps of the present invention are described as follows:

(1)给定卷积神经网络模型，计算每一层特征图对应在原图中的局部感知野大小。(1) Given a convolutional neural network model, calculate the local perceptual field size corresponding to the feature map of each layer in the original image.

以Conv22层为例。对于Conv22中大小为H*W位于角落的区域，Conv22/Conv21是核大小为3*3，步长为1的卷积层，则对应到Conv21中为(H+1)*(W+1)的特征图,对应到Pool1中为(H+1+1)*(W+1+1)的特征图。Take the Conv22 layer as an example. For the area in the corner of Conv22 whose size is H*W, Conv22/Conv21 is a convolutional layer with a kernel size of 3*3 and a step size of 1, which corresponds to (H+1)*(W+1) in Conv21 The feature map of is corresponding to the feature map of (H+1+1)*(W+1+1) in Pool1.

表1Table 1

名称name 类型type 核大小/步长Kernel size/step size 输出大小output size 感知野大小Receptive field size Conv11Conv11 卷积层convolutional layer 3×3/13×3/1 100×100×32100×100×32 51×5151×51 Conv12Conv12 卷积层convolutional layer 3×3/13×3/1 100×100×64100×100×64 52×5252×52 Pool1Pool1 最大池化层max pooling layer 2×2/22×2/2 50×50×6450×50×64 52×5252×52 Conv21Conv21 卷积层convolutional layer 3×3/13×3/1 50×50×6450×50×64 54×5454×54 Conv22Conv22 卷积层convolutional layer 3×3/13×3/1 50×50×12850×50×128 56×5656×56 Pool2Pool2 最大池化层max pooling layer 2×2/22×2/2 25×25×12825×25×128 58×5858×58 Conv31Conv31 卷积层convolutional layer 3×3/13×3/1 25×25×9625×25×96 62×6262×62 Conv32Conv32 卷积层convolutional layer 3×3/13×3/1 25×25×19225×25×192 66×6666×66 Pool3Pool3 最大池化层max pooling layer 2×2/22×2/2 13×13×19213×13×192 70×7070×70 Conv41Conv41 卷积层convolutional layer 3×3/13×3/1 13×13×12813×13×128 78×7878×78 Conv42Conv42 卷积层convolutional layer 3×3/13×3/1 13×13×25613×13×256 86×8686×86 Pool4Pool4 最大池化层max pooling layer 2×2/22×2/2 7×7×2567×7×256 94×9494×94 Conv51Conv51 卷积层convolutional layer 3×3/13×3/1 7×7×1607×7×160 100×100100×100 Conv52Conv52 卷积层convolutional layer 3×3/13×3/1 7×7×3207×7×320 100×100100×100 Conv5Conv5 卷积层convolutional layer 7×7/17×7/1 1×1×3201×1×320 100×100100×100 DropoutDropout dropout(40％)dropout (40%) 1×1×3201×1×320 FC10575FC10575 全连接层fully connected layer 1057510575 Lossloss SoftmaxSoftmax 1057510575

Pool1是核大小为2*2，步长为2的池化层，则Conv22对应到Conv12中为2(H+1+1)*2(W+1+1)的特征图。以此类推，Conv22对应到原图中为(2(H+2)+2)*(2(W+2)+2)。因此，将Conv22输出为50*50的特征图均匀分割为2*2即4个大小为25*25特征图后，每一个分割后的特征图对应到原图中的局部感知野大小为(2(25+2)+2)*(2(25+2)+2)即56*56。Pool1 is a pooling layer with a core size of 2*2 and a step size of 2. Conv22 corresponds to the feature map of 2(H+1+1)*2(W+1+1) in Conv12. By analogy, Conv22 corresponds to (2(H+2)+2)*(2(W+2)+2) in the original image. Therefore, after the Conv22 output 50*50 feature map is evenly divided into 2*2, that is, 4 feature maps with a size of 25*25, each segmented feature map corresponds to the local perceptual field size in the original image (2 (25+2)+2)*(2(25+2)+2) is 56*56.

值得注意的是，上面推导的是分割后特征图位于角落即长宽方向均包含原特征图最边缘信息的情形，若分割后特征图不位于原特征图中的角落位置，则对应局部感知野大小计算应进行细微调整。It is worth noting that the derivation above is that the feature map after segmentation is located in the corner, that is, the length and width directions contain the most edge information of the original feature map. If the feature map after segmentation is not located in the corner of the original feature map, the corresponding local perception field Size calculations should be fine-tuned.

(2)依据每一层局部感知野大小，选取一个合适的层分割，以平衡局部信息和全局信息。(2) According to the local receptive field size of each layer, an appropriate layer segmentation is selected to balance local information and global information.

在本实例中，我们选取Conv22层进行分割。其它层亦是可行的。In this example, we select the Conv22 layer for segmentation. Other layers are also possible.

(3)综合考虑信息量与计算量，选取合适的分割方式与分割个数，对选取层的特征图进行分割。(3) Comprehensively consider the amount of information and the amount of calculation, select the appropriate segmentation method and the number of segmentation, and segment the feature map of the selected layer.

在本实例中，由于输入图像及所有特征图长宽比均为1:1，为避免长宽不一致导致的有形变上采样操作，我们将Conv22层特征图均匀分割大小一致长宽相同的N*N个子特征图。若N＝3，改造后的模型计算量近似是原模型的9倍，会造成效率极大降低，因此我们选择N＝2。Conv22层原输出大小为50*50，如此分割后会得到4个大小为25*25的子特征图。In this example, since the aspect ratio of the input image and all feature maps is 1:1, in order to avoid the deformed upsampling operation caused by inconsistent length and width, we evenly divide the Conv22 layer feature map into N* with the same size and the same length and width. N sub-feature maps. If N=3, the calculation amount of the modified model is approximately 9 times that of the original model, which will greatly reduce the efficiency, so we choose N=2. The original output size of the Conv22 layer is 50*50, and after this division, 4 sub-feature maps with a size of 25*25 will be obtained.

(4)对分割后的特征图进行维度匹配。(4) Perform dimensionality matching on the segmented feature maps.

维度匹配的方法有线性插值、添加去卷积层等。在实现中，线性插值亦通过学习率设为0，权重使用特定初始化方法的去卷积层实现。在本实例中，由于Conv22卷积层后接Pool2池化层，而Pool2层的输出特征图长宽分别是输入特征图大小的二分之一，正好与步骤(3)中选取的分割方式一致，因此可以选择删去Pool2层的方式进行维度匹配。Dimension matching methods include linear interpolation, adding deconvolution layers, etc. In the implementation, linear interpolation is also achieved by deconvolutional layers with learning rate set to 0 and weights using a specific initialization method. In this example, since the Conv22 convolutional layer is followed by the Pool2 pooling layer, the length and width of the output feature map of the Pool2 layer are one-half the size of the input feature map, which is exactly the same as the segmentation method selected in step (3). , so you can choose to delete the Pool2 layer for dimension matching.

(5)将选取层后面的所有层，包括损失函数层，叠加到每一个分割后的子特征图之后，进行训练，完成局部信息与全局信息的整合。(5) All layers behind the selected layer, including the loss function layer, are superimposed on each segmented sub-feature map, and trained to complete the integration of local information and global information.

本发明方法涉及基于局部感知野的单卷积神经网络局部信息与全局信息整合方法，通过对已有卷积神经网络模型进行改造，可以达到在单个模型中同时学习到局部信息与全局信息的目的，具有较好的可移植性和广泛的适用性。The method of the present invention relates to a method for integrating local information and global information of a single convolutional neural network based on a local perceptual field. By modifying the existing convolutional neural network model, the purpose of simultaneously learning local information and global information in a single model can be achieved , with better portability and wide applicability.

为了证明本发明所述方法的有效性，在LFW数据集标准评测指标及BLUFR评测指标上做了人脸验证及人脸鉴别的对比实验。人脸验证问题是给定两张人脸图像，判断两张图像中的人脸是否为同一个人。而人脸鉴别问题是给定一张人脸图像与一个人脸底库，搜索出底库中与该人脸图像同属一个人的照片，若不存在则返回拒绝。LFW数据集包含过万张从互联网抓取的人脸图像，每张图像中的主要个体姓名均被人工标记。在标准评测指标下，6000张人脸图像被分为十组，每组均包含600对，其中300对属于同一个人，300对属于不同人。评测算法需要以交叉验证的方式，在其中九组数据上进行训练选取阈值，然后在剩下一组上进行测试，以此重复十遍。十组数据上的平均准确率作为算法的最终效果。In order to prove the effectiveness of the method described in the present invention, a comparative experiment of face verification and face identification was done on the LFW data set standard evaluation index and the BLUFR evaluation index. The face verification problem is given two face images, to determine whether the faces in the two images are the same person. The problem of face identification is to give a face image and a face database, search out the photos in the database that belong to the same person as the face image, and return rejection if it does not exist. The LFW dataset contains tens of thousands of face images captured from the Internet, and the main individual names in each image are manually marked. Under the standard evaluation index, 6000 face images are divided into ten groups, each group contains 600 pairs, of which 300 pairs belong to the same person, and 300 pairs belong to different people. The evaluation algorithm needs to use cross-validation to train on nine sets of data to select thresholds, and then test on the remaining set, repeating ten times. The average accuracy rate on ten sets of data is used as the final effect of the algorithm.

BLUFR评测指标则利用了LFW所有图像，旨在评测算法在低误报率(FAR)下的开集人脸鉴别准确率(DIR)。The BLUFR evaluation index utilizes all LFW images and aims to evaluate the open-set face identification accuracy (DIR) of the algorithm under low false positive rate (FAR).

所有对比实验使用CASIA-webface数据集作为卷积神经网络模型的训练集。其包含494414张从imdb网站抓取的人脸图像，分别归属于经人工保证与LFW互斥的10575个人。All comparative experiments use the CASIA-webface dataset as the training set for the convolutional neural network model. It contains 494,414 face images captured from the imdb website, which belong to 10,575 individuals who have been manually guaranteed to be mutually exclusive with LFW.

表2Table 2

如表2所示是在LFW的标准评测指标下的结果。As shown in Table 2, it is the result under the standard evaluation index of LFW.

A.基础模型。基础模型即未做任何分割操作时，取得了98.02％的准确率。同时，我们将基础模型conv22层及之后的所有卷积层通道数翻倍，得到基础模型-2x，以匹配分割conv22层模型的计算量，用于对比。该模型取得了98.10％的准确率。A. Base model. The base model achieves 98.02% accuracy without any segmentation. At the same time, we doubled the number of channels of the basic model conv22 layer and all subsequent convolutional layers to obtain the basic model -2x, to match the calculation amount of the split conv22 layer model for comparison. The model achieved 98.10% accuracy.

B.分割image层。因为conv22层中每个分割后的特征图对应原图中的局部感知野大小为56*56，我们对分割图像作了两组实验：1.均匀分割为4个有重叠的56*56区域；2.均匀分割为4个无重叠的50*50区域。分割后的各个子块被缩放至100*100然后输入到基础模型。由于全局信息的缺失，对每个分割后的图像对应的特征单独进行测试仅能取得95％左右的准确率。取最大操作不能很好的利用全部信息，同时丢失了部分信息，准确率有所下降。取平均操作的准确率相比单个特征的准确率有所提升。拼接操作保留了所有单个特征的信息，准确率提升比取平均操作更大，超过了基础模型。由于拥有更多的信息，分割为4个56*56区域效果普遍优于分割为4个50*50区域的效果。B. Split the image layer. Because each segmented feature map in the conv22 layer corresponds to a local perceptual field size of 56*56 in the original image, we conducted two sets of experiments on the segmented image: 1. Evenly divided into four overlapping 56*56 regions; 2. Evenly divided into 4 non-overlapping 50*50 areas. Each sub-block after segmentation is scaled to 100*100 and then input to the base model. Due to the lack of global information, testing the features corresponding to each segmented image alone can only achieve an accuracy rate of about 95%. Taking the maximum operation cannot make good use of all information, and at the same time part of the information is lost, and the accuracy rate decreases. The accuracy of the averaging operation is improved compared to the accuracy of a single feature. The splicing operation preserves the information of all individual features, and the accuracy improvement is greater than that of the averaging operation, exceeding the base model. Due to having more information, the effect of dividing into four 56*56 regions is generally better than that of dividing into four 50*50 regions.

C.分割conv52层。conv52层中每个分割后的特征图都获得了全部原图信息，因此对每个分割后的特征图对应的特征单独进行测试均能取得98％左右的准确率。但由于conv52与conv5之间的连接数减小，它们的效果均低于基础模型。取最大和取平均操作没有增加额外的信息，反而丢失了必要的信息，准确率相比单个有明显下降。拼接操作相比单个特征的准确率有所提升，达到了与基础模型相近的效果。C. Split conv52 layer. Each segmented feature map in the conv52 layer has obtained all the original image information, so testing the features corresponding to each segmented feature map can achieve an accuracy rate of about 98%. But due to the reduced number of connections between conv52 and conv5, they are both less effective than the base model. Taking the maximum and averaging operations does not add additional information, but loses the necessary information, and the accuracy rate is significantly lower than that of a single one. The splicing operation has improved accuracy compared to single features, achieving a similar effect to the basic model.

D.分割conv22层。类似于分割image层，由于未能充分利用原图全部信息，对每个分割conv22后的特征图对应的特征单独进行测试仅能取得96％左右的准确率，但均优于分割image层。这是因为分割conv22层时，conv11、conv12与conv21均保持不变，共享的卷积核能间接传递一些全局信息。取最大操作由于丢失部分信息准确率大幅降低。取平均操作的准确率相比单个特征的准确率有较大提升，接近基础模型。拼接操作保留了所有单个特征的信息，准确率提升巨大，极大超过基础模型的效果，达到了99.02％。D. Split conv22 layer. Similar to the segmented image layer, due to the failure to fully utilize all the information of the original image, testing the features corresponding to each segmented conv22 feature map can only achieve an accuracy rate of about 96%, but it is better than the segmented image layer. This is because when the conv22 layer is divided, conv11, conv12 and conv21 remain unchanged, and the shared convolution kernel can indirectly transfer some global information. The accuracy of the maximum operation is greatly reduced due to the loss of part of the information. Compared with the accuracy of a single feature, the accuracy of the average operation is greatly improved, and it is close to the basic model. The splicing operation retains the information of all individual features, and the accuracy rate is greatly improved, which greatly exceeds the effect of the basic model, reaching 99.02%.

同时，表2后三排所示是利用不同维度匹配方法(分割conv22层)的效果。时间一列指抽取1张图像特征所需时间。双线性插值与去卷积均取得了比去池化更好的效果，但提升并不明显。同时，它们抽取特征所需时间是去池化的近2倍。因此，综合准确率与效率，去池化是个较优的维度匹配选择。At the same time, the last three rows of Table 2 show the effect of using different dimension matching methods (segmenting the conv22 layer). The column of time refers to the time required to extract the features of an image. Both bilinear interpolation and deconvolution have achieved better results than depooling, but the improvement is not obvious. At the same time, the time required for them to extract features is nearly 2 times that of depooling. Therefore, in terms of comprehensive accuracy and efficiency, depooling is a better choice for dimension matching.

表3table 3

如表3所示是在LFW的BLUFR评测指标下的结果。与标准评测指标下结果类似，分割conv22层大幅提升了原始模型效果，在VR指标下将错误率降低了近50％。As shown in Table 3, it is the result under the BLUFR evaluation index of LFW. Similar to the results under standard evaluation metrics, splitting the conv22 layer greatly improves the performance of the original model, and reduces the error rate by nearly 50% under the VR metrics.

表4Table 4

如表4所示是本发明及其它当前最好的方法在LFW的效果。Table 4 shows the effect of the present invention and other current best methods on LFW.

绝大多数在LFW上取得99％准确率的算法都使用了切割原图的方式。FaceNet和Baidu是两个例外，但它们使用的训练集均为私有且远远大于本发明使用的。CASIA与MM-DFR与本发明使用同一训练集，但相比CASIA，本发明将错误了降低了3倍；MM-DFR与本发明准确率差不多，但需要更多的计算量。依发明者所知，本发明是第一个使用公开的训练集，仅基于原图而不需要子块在LFW取得99％准确率的方法。Most of the algorithms that achieve 99% accuracy on LFW use the method of cutting the original image. FaceNet and Baidu are two exceptions, but the training sets they use are both private and much larger than the one used in this invention. CASIA and MM-DFR use the same training set as the present invention, but compared with CASIA, the present invention reduces the error by 3 times; MM-DFR has almost the same accuracy rate as the present invention, but requires more calculation. As far as the inventors know, the present invention is the first method to achieve 99% accuracy in LFW only based on the original image without sub-blocks using the public training set.

以上所述仅为本发明的优选实施方式，本发明的保护范围并不仅限于上述实施方式，凡是属于本发明原理的技术方案均属于本发明的保护范围。对于本领域的技术人员而言，在不脱离本发明的原理的前提下进行的若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。The above descriptions are only preferred implementations of the present invention, and the scope of protection of the present invention is not limited to the above-mentioned implementations. All technical solutions belonging to the principle of the present invention belong to the scope of protection of the present invention. For those skilled in the art, some improvements and modifications made without departing from the principles of the present invention should also be regarded as the protection scope of the present invention.

Claims

1. A single convolutional neural network local information and global information integration method based on local perceptual field, including:

Step 1, given the convolutional neural network model, calculate the local perceptual field size corresponding to the feature map of each layer in the original image;

Step 2, according to the size of the local perceptual field of each layer, select a layer segmentation to balance local information and global information;

Step 3, comprehensively considering the amount of information and the amount of calculation, select the segmentation method and the number of segmentations, and segment the feature map of the selected layer. The segmentation method is to evenly divide the feature map of the layer into 2*2 feature maps according to the key points;

Step 4, perform dimension matching on the segmented feature map;

In step 5, all layers behind the selected layer, including the loss function layer, are superimposed on each segmented sub-feature map, and trained to complete the integration of local information and global information.

2. The single convolutional neural network local information and global information integration method based on the local perceptual field according to claim 1, characterized in that, in step 1, the size of the local perceptual field uses a layer-by-layer iterative calculation method:

For a region with a size of H _N * W _N in the feature map of the N-th layer, it is calculated according to the network parameters of the N-th layer to obtain its corresponding size H _N-1 * W _N-1 in the feature map of the N-1 layer, step by step Layer iteration until the input layer of the 0th layer, H ₀ *W ₀ is obtained, which is the required local perceptual field size;

Among them, H _N and W _N are the length and width of the region described in the feature map of the Nth layer, H _N-1 and W _N-1 are the length and width of the region described in the feature map of the N-1 layer, H ₀ and W ₀ is the length and width of the area described in the feature map of layer 0.

3. The method for integrating local information and global information of a single convolutional neural network based on a local perception field according to claim 2, wherein said network parameters are kernel size and step size.

4. the single convolutional neural network local information and global information integration method based on the local perceptual field according to claim 1, characterized in that, in step 2, the selected layer is located in the middle of the network and is followed by pooling layers of convolutional layers.

5. the single convolutional neural network local information and global information integration method based on local perception field according to claim 1, is characterized in that, in step 4, described feature map uses linear interpolation or adds deconvolution layer method to perform zoom-in operations for dimension matching.