CN109101975B - Image semantic segmentation method based on full convolution neural network - Google Patents
Image semantic segmentation method based on full convolution neural network Download PDFInfo
- Publication number
- CN109101975B CN109101975B CN201810947884.XA CN201810947884A CN109101975B CN 109101975 B CN109101975 B CN 109101975B CN 201810947884 A CN201810947884 A CN 201810947884A CN 109101975 B CN109101975 B CN 109101975B
- Authority
- CN
- China
- Prior art keywords
- semantic segmentation
- feature
- feature map
- image
- pooling layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
本发明公开一种基于全卷积神经网络的图像语义分割方法,涉及图像语义分割和深度学习领域,包括如下步骤:选择训练数据集;构建并训练由图像到类别标签的分类模型,并将其作为语义分割模型前端网络;前端网络每个块输出的特征图分别经过细节保留池化层降采样成统一大小,然后将这四个输出特征图串联,并通过特征重校正模块,重新校正特征图后,将得到的特征图传入后端网络;后端网络是主要负责图像上采样,在经过上采样之后,再经过一个变权重的全局池化,最后与训练数据集的语义标注图像计算交叉熵,进行误差反向传播。本发明解决了现有技术中的图像分割准确率较低的问题。
The invention discloses an image semantic segmentation method based on a fully convolutional neural network, which relates to the field of image semantic segmentation and deep learning, and includes the following steps: selecting a training data set; constructing and training a classification model from images to category labels, As the front-end network of the semantic segmentation model; the feature map output by each block of the front-end network is down-sampled to a uniform size through the detail preservation pooling layer, and then the four output feature maps are connected in series, and the feature map is re-corrected through the feature re-correction module Then, the obtained feature map is transmitted to the back-end network; the back-end network is mainly responsible for image upsampling. After upsampling, it undergoes a global pooling with variable weights, and finally intersects with the semantically labeled images of the training dataset. Entropy, backpropagation of errors. The invention solves the problem of low image segmentation accuracy in the prior art.
Description
技术领域technical field
本发明涉及图像语义分割和深度学习领域,尤其涉及基于全卷积神经网络的图像语义分割方法。The present invention relates to the field of image semantic segmentation and deep learning, in particular to an image semantic segmentation method based on a fully convolutional neural network.
背景技术Background technique
语义分割是计算机视觉领域里一个重要的问题。图像语义分割是给每一个像素都赋予一个不同的标签(类别),因此可以被认为是一个密集分类问题。Semantic segmentation is an important problem in the field of computer vision. Image semantic segmentation is to assign a different label (category) to each pixel, so it can be considered as a dense classification problem.
近年来,绝大多数当前最佳的图像语义分割方法都是基于全卷积神经网络的。典型的语义分割网络结构是编码器-解码器结构,编码器是一个图像降采样过程,负责抽取图像粗糙的语义特征,紧接着就是一个解码器,解码器是一个图像上采样过程,负责对降采样得到的图像特征进行上采样恢复到输入图像原始维度。In recent years, the vast majority of current state-of-the-art image semantic segmentation methods are based on fully convolutional neural networks. A typical semantic segmentation network structure is an encoder-decoder structure. The encoder is an image downsampling process, which is responsible for extracting rough semantic features of the image, followed by a decoder, which is an image upsampling process responsible for downsampling. The sampled image features are upsampled to restore the original dimensions of the input image.
虽然池化在卷积神经网络的降采样过程中是一个关键的组成部分,可以用来降低参数的规模,增强对某些扭曲的不变性,同时增大感受野。但是因为池化本身就是一个有损耗的过程,所以在语义分割的图像降采样过程中,它会导致图像语义信息的丢失,使语义分割结果的精度偏低。While pooling is a key component in the downsampling process of convolutional neural networks, it can be used to reduce the size of parameters, enhance invariance to certain distortions, and at the same time increase the receptive field. However, because pooling itself is a lossy process, in the process of image downsampling for semantic segmentation, it will lead to the loss of image semantic information, which makes the accuracy of semantic segmentation results low.
在深度卷积神经网络中,经常使用跨步卷积(str ided convo l ut ions)代替池化层达到降采样的作用,跨步卷积只考虑每个局部邻域的固定位置的一个节点,而不考虑激活的重要性。从图像降采样的角度,这样的降采样方式同样也会导致特征的失真。全卷积神经网络为大量的应用程序设计了最先进的图像语义分割算法,其中网络结构的创新主要集中在改进空间编码或网络连接来促进梯度流。In deep convolutional neural networks, strided convolutions are often used instead of pooling layers to achieve downsampling, and strided convolutions only consider one node at a fixed position in each local neighborhood, regardless of the importance of activation. From the perspective of image downsampling, such downsampling also leads to feature distortion. Fully Convolutional Neural Networks have designed state-of-the-art image semantic segmentation algorithms for a large number of applications, where innovations in network structures focus on improving spatial encoding or network connections to facilitate gradient flow.
发明内容SUMMARY OF THE INVENTION
本发明的目的在于设计一种基于全卷积神经网络的图像语义分割方法,以解决现有技术中的图像分割准确率较低的问题。The purpose of the present invention is to design an image semantic segmentation method based on a fully convolutional neural network, so as to solve the problem of low image segmentation accuracy in the prior art.
本发明的技术方案如下:The technical scheme of the present invention is as follows:
基于全卷积神经网络的图像语义分割方法,包括如下步骤:The image semantic segmentation method based on fully convolutional neural network includes the following steps:
步骤1:选择训练数据集。Step 1: Choose a training dataset.
步骤2:构建并训练由图像到类别标签的分类模型,并将其作为语义分割模型前端网络;Step 2: Build and train a classification model from images to class labels, and use it as a front-end network for semantic segmentation models;
语义分割模型前端网络的结构包括Conv1、Conv2_x、Conv3_x和Conv4_x,Conv1、Conv2_x、Conv3_x和Conv4_x均包含多个卷积层,Conv1、Conv2_x、Conv3_x和Conv4_x的后面均连接一个细节保留池化层。The structure of the front-end network of the semantic segmentation model includes Conv1, Conv2_x, Conv3_x and Conv4_x. Conv1, Conv2_x, Conv3_x and Conv4_x all contain multiple convolutional layers, and Conv1, Conv2_x, Conv3_x and Conv4_x are all connected with a detail-preserving pooling layer behind.
步骤3:以训练好的语义分割模型前端网络为基础,构建语义分割模型后端网络。Step 3: Based on the trained front-end network of the semantic segmentation model, construct the back-end network of the semantic segmentation model.
后端网络的结构包括细节保留池化层、特征重校正模块、1×1的卷积层Conv5_x、Conv6_x、Conv7_x、变权全局池化层和上采样层;Conv1、Conv2_x、Conv3_x的输出分别通过三个细节保留池化层后与Conv4_x串联连接后,共同输入特征重校正模块;Conv5_x、Conv6_x和Conv7_x前均连接一个上采样层,Conv5_x、Conv6_x和Conv7_x均包括卷积层、批归一化层和线性整流单元,Conv5_x、Conv6_x、Conv7_x通过跳跃结构分别依次与Conv3_x、Conv2_x和Conv1的输出特征图串联;The structure of the back-end network includes a detail-preserving pooling layer, a feature re-correction module, 1×1 convolutional layers Conv5_x, Conv6_x, Conv7_x, a variable-weight global pooling layer, and an upsampling layer; the outputs of Conv1, Conv2_x, Conv3_x pass through After the three detail retention pooling layers are connected in series with Conv4_x, they are jointly input to the feature recalibration module; Conv5_x, Conv6_x and Conv7_x are all connected to an upsampling layer, Conv5_x, Conv6_x and Conv7_x all include convolutional layers, batch normalization layers And the linear rectifier unit, Conv5_x, Conv6_x, Conv7_x are connected in series with the output feature maps of Conv3_x, Conv2_x and Conv1 respectively through the skip structure;
特征重校正模块经过1个1×1的卷积层,得到特征图,将特征图上采样后与Conv5_x连接;The feature recalibration module passes through a 1×1 convolution layer to obtain a feature map, which is up-sampled and connected to Conv5_x;
其中,变权全局池化层表示给全局平均池化中的1×1卷积加上1个权值向量,通过标准高斯分布进行参数初始化,在误差反向传播过程中,不断更新像素的权值。Among them, the variable weight global pooling layer means adding a weight vector to the 1×1 convolution in the global average pooling, initializing the parameters through the standard Gaussian distribution, and continuously updating the weights of the pixels in the process of error back propagation. value.
步骤4:对整个图像语义分割模型进行训练。Step 4: Train the entire image semantic segmentation model.
步骤5:输入新的图像,在已训练好的深度神经网络模型中进行一次前向传播,端到端地输出预测的语义分割结果。Step 5: Input a new image, perform a forward propagation in the trained deep neural network model, and output the predicted semantic segmentation result end-to-end.
具体地,所述语义分割模块前端模型前端网络中包括33个残差结构,每个残差结构包含1个1×1的卷积、1个3×3的卷积、1个1×1的卷积和1条快捷连接。Specifically, the front-end network of the front-end model of the semantic segmentation module includes 33 residual structures, and each residual structure includes a 1×1 convolution, a 3×3 convolution, and a 1×1 convolution. Convolution and 1 shortcut connection.
具体地,所述Conv1后的细节保留池化层对Conv1的输出特征图降采样8倍,Conv2_x后的细节保留池化层对Conv2_x的输出特征图降采样4倍,Conv3_x后的细节保留池化层对Conv3_x的输出特征图降采样2倍。Specifically, the detail retention pooling layer after Conv1 downsamples the output feature map of Conv1 by 8 times, the detail retention pooling layer after Conv2_x downsamples the output feature map of Conv2_x by 4 times, and the detail retention pooling after Conv3_x The layer downsamples the output feature map of Conv3_x by a factor of 2.
具体地,所述细节保留池化层的具体过程为:Specifically, the specific process of the detail-preserving pooling layer is as follows:
根据输入的特征图I计算每个位置的输出:Calculate the output for each location based on the input feature map I:
其中,表示输入特征图经过细节保留池化层后输出位置p的值;in, Indicates the value of the output position p after the input feature map passes through the detail preservation pooling layer;
输入节点的空间平均权重ωα,β[p,q]为 The spatially averaged weights ω α, β [p, q] of the input nodes are
其中α为偏置指数,β为奖励指数。ρβ(·)是反双边滤波函数,用来在邻域空间Ωp计算输入点的权重,β减少奖励函数的动态范围,β→0就是简单的领域平均。where α is the bias index and β is the reward index. ρ β( ) is an inverse bilateral filter function used to calculate the weight of input points in the neighborhood space Ω p , β reduces the dynamic range of the reward function, and β→0 is a simple domain average.
是线性尺度缩减因子,具体为: is the linear scale reduction factor, specifically:
其中F是在邻域上的一个可学习的,非标准化的2D滤波器,这个的尺寸为3×3。where F is in the neighborhood A learnable, non-normalized 2D filter on , this The dimensions are 3×3.
具体地,所述特征重校正模块为结合了空间特征重矫正与通道特征重矫正的网络模块。Specifically, the feature re-correction module is a network module that combines spatial feature re-correction and channel feature re-correction.
具体地,训练整个图像语义分割模型的过程为:Specifically, the process of training the entire image semantic segmentation model is as follows:
步骤4.1:对训练数据集中的图像进行预处理,将图像剪裁为固定尺寸。Step 4.1: Preprocess the images in the training dataset and crop the images to a fixed size.
步骤4.2:对整个图像语义分割模型进行初始化。Step 4.2: Initialize the entire image semantic segmentation model.
步骤4.3:对训练数据集中的数据通过翻转、缩放和旋转的方式进行扩增。Step 4.3: Augment the data in the training dataset by flipping, scaling, and rotating.
步骤4.4:以每一像素的交叉熵损失的和作为损失函数,再使用随机梯度下降算法进行误差反向传播,更新模型参数,得到训练好的语义分割模型。Step 4.4: Use the sum of the cross-entropy loss of each pixel as the loss function, and then use the stochastic gradient descent algorithm to perform error back propagation, update the model parameters, and obtain a trained semantic segmentation model.
采用上述方案后,本发明的有益效果如下:After adopting the above scheme, the beneficial effects of the present invention are as follows:
(1)本发明的图像语义分割模型引入了细节保留池化层,在降采样过程中,能够保留更多的图像细节信息。细节保留池化层是一种自适应的池化方法,这种方法能够放大空间变化并保留重要的结构细节,同样重要的是,它的参数可以和网络的其余部分共同学习。(1) The image semantic segmentation model of the present invention introduces a detail-preserving pooling layer, which can retain more image detail information in the down-sampling process. The detail-preserving pooling layer is an adaptive pooling method that amplifies spatial variation and preserves important structural details, and just as importantly, its parameters can be learned jointly with the rest of the network.
(2)本发明中引入特征重校正模块,对特征进行重校正,空间特征重校正能够更好的将空间中所有同一位置像素的重要性得到重新校正,并赋以相应的权值,提高语义分割的准确率,通道特征重校正能够将重要的通道赋以高权值,突出重要性;总之,特征重校正模块能够有效地解决图像语义分割准确率低、池化过程中细节信息丢失的问题,最终得到较好的语义分割结果。(2) The feature re-correction module is introduced in the present invention to re-correct the features, and the spatial feature re-correction can better re-correct the importance of all the pixels at the same position in the space, and assign corresponding weights to improve the semantics The accuracy of segmentation and channel feature re-correction can assign high weights to important channels and highlight their importance; in short, the feature re-correction module can effectively solve the problem of low accuracy of image semantic segmentation and loss of detailed information during pooling. , and finally get better semantic segmentation results.
(3)所述的变权全局池化层,由于传统的全局平局池化操作,对所有特征通道的同一位置都执行相同操作,即1×1卷积,不能突出语义分割中的每个像素点的正确分类类别,给全局平均池化中的1×1卷积加上1个权值向量,通过标准高斯分布进行参数初始化,在误差反向传播过程中,不断更新像素的权值,能够更好的进行逐像素分类,还能起到加快收敛的作用。(3) The variable weight global pooling layer, due to the traditional global draw pooling operation, performs the same operation on the same position of all feature channels, that is, 1×1 convolution, which cannot highlight each pixel in the semantic segmentation. The correct classification category of the point, add a weight vector to the 1×1 convolution in the global average pooling, initialize the parameters through the standard Gaussian distribution, and continuously update the weights of the pixels in the process of error back propagation. Better pixel-by-pixel classification can also play a role in accelerating convergence.
附图说明Description of drawings
图1为本发明的流程图;Fig. 1 is the flow chart of the present invention;
图2为本发明的图像语义分割模型结构图;Fig. 2 is the structure diagram of the image semantic segmentation model of the present invention;
图3为本发明的残差结构图;3 is a residual structure diagram of the present invention;
图4为本发明的特征重校正模块结构图;4 is a structural diagram of a feature recalibration module of the present invention;
图5为本发明的通道特征重校正模块结构图;5 is a structural diagram of a channel feature recalibration module of the present invention;
图6为本发明的空间特征重校正模块结构图。FIG. 6 is a structural diagram of a spatial feature recalibration module of the present invention.
具体实施方式Detailed ways
为了使本发明的目的、技术方案和优点更加清楚明白,以下结合具体实施与和附图,对本发明作进一步详细说明。In order to make the objectives, technical solutions and advantages of the present invention more clear, the present invention will be further described in detail below with reference to the specific implementation and accompanying drawings.
为解决现有技术中的图像分割准确率较低的问题,本发明提出的一种基于全卷积神经网络的图像语义分割方法,能够广泛应用于一般二维图像语义分割的领域。In order to solve the problem of low image segmentation accuracy in the prior art, an image semantic segmentation method based on a fully convolutional neural network proposed by the present invention can be widely used in the field of general two-dimensional image semantic segmentation.
如图1所示,基于全卷积神经网络的图像语义分割方法,本发明包括如下步骤:As shown in Figure 1, the image semantic segmentation method based on the fully convolutional neural network, the present invention comprises the following steps:
步骤1:选择训练数据集;本实施例中以VOC 2012数据集的21类(其中1类为背景)场景类别为基准,采集COCO数据集中包含上述20类类别目标的图像加入数据集,最终得到训练和测试数据集。Step 1: Select the training data set; in this embodiment, the 21 categories of the VOC 2012 data set (one of which is the background) scene category is used as the benchmark, and the images containing the above 20 categories of objects in the COCO dataset are collected and added to the dataset, and finally obtained: training and testing datasets.
步骤2:构建并训练由图像到类别标签的分类模型,并将其作为语义分割模型前端网络。Step 2: Build and train an image-to-class label classification model as a front-end network for a semantic segmentation model.
如图2所示,语义分割模型前端网络的结构包括Conv1、Conv2_x、Conv3_x和Conv4_x,Conv1、Conv2_x、Conv3_x和Conv4_x均包含多个卷积层,Conv1、Conv2_x、Conv3_x和Conv4_x的后面均连接一个细节保留池化层;在每个块Conv2_x、Conv3_x和Conv4_x后面加一个细节保留池化层,这是一种自适应池化方法,能够放大空间变化并保留重要的结构细节。As shown in Figure 2, the structure of the front-end network of the semantic segmentation model includes Conv1, Conv2_x, Conv3_x and Conv4_x. Conv1, Conv2_x, Conv3_x and Conv4_x all contain multiple convolutional layers, and Conv1, Conv2_x, Conv3_x and Conv4_x are all connected with a detail behind Preserving pooling layer; a detail preserving pooling layer is added after each block Conv2_x, Conv3_x and Conv4_x, which is an adaptive pooling method capable of amplifying spatial variation and preserving important structural details.
如图3所示,语义分割前端模型前端网络中包括33个残差结构,每个残差结构包含1个1×1的卷积、1个3×3的卷积、1个1×1的卷积和1条快捷连接(shortcut connection)。As shown in Figure 3, the front-end network of the semantic segmentation front-end model includes 33 residual structures, each of which includes a 1×1 convolution, a 3×3 convolution, and a 1×1 convolution. Convolution and 1 shortcut connection.
为了方便描述,将Conv1输出的特征图(尺寸为112×112)、Conv2_x输出的特征图(尺寸为56×56),Conv3_x输出的特征图(尺寸为28×28),Conv4_x输出的特征图(尺寸为14×14)记为特征图Res_1、特征图Res_2、特征图Res_3和特征图Res_4。For the convenience of description, the feature map output by Conv1 (size is 112×112), the feature map output by Conv2_x (size is 56×56), the feature map output by Conv3_x (size is 28×28), and the feature map output by Conv4_x ( The size is 14×14) is denoted as feature map Res_1, feature map Res_2, feature map Res_3 and feature map Res_4.
Conv1后的细节保留池化层对特征图Res_1降采样8倍,Conv2_x后的细节保留池化层对特征图Res_2降采样4倍,Conv3_x后的细节保留池化层对特征图Res_3降采样2倍。The detail retention pooling layer after Conv1 downsamples the feature map Res_1 by 8 times, the detail retention pooling layer after Conv2_x downsamples the feature map Res_2 by 4 times, and the detail retention pooling layer after Conv3_x downsamples the feature map Res_3 by 2 times .
步骤3:以训练好的语义分割模型前端网络为基础,构建语义分割模型后端网络。Step 3: Based on the trained front-end network of the semantic segmentation model, construct the back-end network of the semantic segmentation model.
如图2所示,后端网络的结构包括细节保留池化层、特征重校正模块、卷积层、Conv5_x、Conv6_x、Conv7_x、卷积层、变权全局池化层和上采样层;特征图Res_1、特征图Res_2、特征图Res_3分别通过三个细节保留池化层后与Conv4_x串联连接后,共同输入特征重校正模块;Conv5_x、Conv6_x和Conv7_x前均连接一个上采样层,Conv5_x、Conv6_x和Conv7_x均包括卷积层、批归一化层和线性整流单元,Conv5_x、Conv6_x、Conv7_x通过跳跃结构分别依次与Conv3_x、Conv2_x和Conv1的输出特征图串联。As shown in Figure 2, the structure of the back-end network includes a detail-preserving pooling layer, a feature recalibration module, a convolutional layer, Conv5_x, Conv6_x, Conv7_x, a convolutional layer, a variable-weight global pooling layer, and an upsampling layer; the feature map Res_1, feature map Res_2, and feature map Res_3 are connected in series with Conv4_x through three detail retention pooling layers, respectively, and then jointly input the feature recalibration module; Conv5_x, Conv6_x and Conv7_x are connected to an upsampling layer before Conv5_x, Conv6_x and Conv7_x All include convolutional layers, batch normalization layers, and linear rectification units. Conv5_x, Conv6_x, and Conv7_x are connected in series with the output feature maps of Conv3_x, Conv2_x, and Conv1 through skip structures, respectively.
对于步骤2和步骤3中,所述细节保留池化层的具体过程为:For step 2 and step 3, the specific process of the detail retention pooling layer is as follows:
根据输入的特征图I计算每个位置的输出P:Calculate the output P at each location based on the input feature map I:
其中,表示输入特征图经过细节保留池化层后输出位置P的值;邻域空间输入节点的空间权重平均ωα,β[p,q]为in, Represents the value of the output position P after the input feature map passes through the detail preservation pooling layer; the neighborhood space The spatial weight average ω α, β [p, q] of the input nodes is
其中α为偏置指数,β为奖励指数。ρβ(·)是反双边滤波函数,用来在邻域空间Ωp计算输入点的权重,β减少奖励函数的动态范围,β→0就是简单的领域平均。where α is the bias index and β is the reward index. ρ β( ) is an inverse bilateral filter function used to calculate the weight of input points in the neighborhood space Ω p , β reduces the dynamic range of the reward function, and β→0 is a simple domain average.
是线性尺度缩减因子,具体为: is the linear scale reduction factor, specifically:
其中F是在邻域上的一个可学习的,非标准化的2D滤波器,这个的尺寸为3×3。where F is in the neighborhood A learnable, non-normalized 2D filter on , this The dimensions are 3×3.
具体地,特征重校正模块(如图4所示)为结合空间特征重校正与通道特征重校正的网络模块。Specifically, the feature recalibration module (as shown in FIG. 4 ) is a network module that combines spatial feature recalibration and channel feature recalibration.
下面将分开进行说明:The following will be explained separately:
如图5所示,空间特征重校正模块中过程为:As shown in Figure 5, the process in the spatial feature recalibration module is:
(1)将原始特征图经过一个卷积核大小为1×1,通道数为c(每个通道的权值不共享,让其从学习中获得)的卷积,得到一个特征图 (1) Convert the original feature map After a convolution with a convolution kernel size of 1×1 and a channel number of c (the weight of each channel is not shared, let it be obtained from learning), a feature map is obtained
(2)再将其经过一个sigmoid层,将Mc的每个空间位置M′(i,,j),i∈{1,2,…,H},j∈{1,2,…,W}的重要性重新校正,并赋以每个空间位置一个权值p(i,j),得到的p(i,j)与原始特征图Mc进行点乘。(2) Then pass it through a sigmoid layer, and each spatial position M'(i,,j) of M c , i∈{1,2,...,H}, j∈{1,2,...,W The importance of } is re-corrected, and a weight p(i, j) is assigned to each spatial position, and the obtained p(i, j) is dot-multiplied with the original feature map Mc .
最终,Mc经过空间特征重校正得到的特征图为:Finally, the feature map obtained by M c after spatial feature re-correction is:
空间特征重校正能够更好的将空间中所有同一位置像素的重要性得到重新校正,并赋以相应的权值,提高语义分割的准确率。Spatial feature recalibration can better recalibrate the importance of all pixels at the same location in the space, and assign corresponding weights to improve the accuracy of semantic segmentation.
如图6所示,通道特征重校正模块中过程为:As shown in Figure 6, the process in the channel feature recalibration module is:
(1)将原始特征图经过一个全局平均池化,得到一个特征图在再将M′与原始特征图Mc进行全连接,进行特征图的整合。(1) Convert the original feature map After a global average pooling, a feature map is obtained Then M' is fully connected with the original feature map Mc to integrate the feature maps.
(2)整合后的特征图再经过一个线性修正单元,对特征进行修正。(2) The integrated feature map is then subjected to a linear correction unit to correct the features.
(3)对修正后的特征图最后再经过一个卷积核大小为H×W,通道数为c的卷积得到一个特征向量 (3) The corrected feature map is finally subjected to a convolution with a convolution kernel size of H×W and a channel number of c to obtain a feature vector
(4)特征图再经过一个sigmoid层,将特征向量z的激活范围限定在[0,1]之间,得到一个通道权值向量Mc经过通道特征重校正得到的特征图:(4) The feature map passes through a sigmoid layer, and the activation range of the feature vector z is limited to [0, 1], and a channel weight vector is obtained. The feature map obtained by M c after channel feature recalibration:
经过通道特征重校正,能够将重要的通道赋以高权值,突出重要性。After channel feature recalibration, important channels can be assigned high weights to highlight their importance.
步骤4:对整个图像语义分割模型进行训练;训练整个图像语义分割模型的过程为。Step 4: Train the entire image semantic segmentation model; the process of training the entire image semantic segmentation model is:
步骤4.1:对训练数据集中的图像进行预处理,将图像剪裁为固定尺寸513×513。Step 4.1: Preprocess the images in the training dataset, and crop the images to a fixed size of 513×513.
步骤4.2:对整个图像语义分割模型进行初始化,即以预训练好的图像语义分割模型的参数值为初始值。Step 4.2: Initialize the entire image semantic segmentation model, that is, use the parameter values of the pre-trained image semantic segmentation model as initial values.
步骤4.3:对训练数据集中的数据通过翻转、缩放和旋转的方式进行扩增;具体地,翻转为随机翻转;在原图像的在0.5到2倍之间随机缩放图像;在原图像在-10到10度之间,随机旋转图像。Step 4.3: Augment the data in the training data set by flipping, scaling and rotating; specifically, flipping is random flipping; randomly scaling the image between 0.5 and 2 times of the original image; in the original image, between -10 and 10 between degrees, rotate the image randomly.
步骤4.4:以每一像素的交叉熵损失的和作为损失函数,再使用随机梯度下降算法进行误差反向传播,用多项式学习策略,更新模型参数,得到训练好的语义分割模型。多项式学习策略中,学习率lr设置为:Step 4.4: Use the sum of the cross-entropy loss of each pixel as the loss function, and then use the stochastic gradient descent algorithm for error back propagation, and use the polynomial learning strategy to update the model parameters to obtain a trained semantic segmentation model. In the polynomial learning strategy, the learning rate lr is set as:
其中,baselr为初始学习率,这里设置为0.001,power设置化0.9。Among them, baselr is the initial learning rate, which is set to 0.001 here, and power is set to 0.9.
步骤5:输入新的图像,在已训练好的深度神经网络模型中进行一次前向传播,端到端地输出预测的语义分割结果。Step 5: Input a new image, perform a forward propagation in the trained deep neural network model, and output the predicted semantic segmentation result end-to-end.
本发明的原理和过程如下:在本发明的图像语义分割模型中采用Conv1输出的特征图Res_1,Conv2_x输出的特征图Res_2,Conv3_x输出的特征图Res_3,Conv4_x输出的特征图Res_4,分别为前端网络(即特征提取网络)的第一层、第二层、第三层和第四层。然后将特征图Res_1经过细节保留池化层进行保留细节卷积降采样8倍,Res_2经过经过细节保留池化层进行保留细节池化降采样4倍,Res_3经过经过细节保留池化层进行保留细节卷积降采样2倍以及Res_1串联起来,输入到特征重校正模块,经过通道特征重校正,空间特征重校正能够更好的将空间中所有同一位置像素的重要性得到重新校正,并赋以相应的权值,提高语义分割的准确率,通道特征重校正能够将重要的通道赋以高权值,突出重要性。然后将特征重校正模块输出的特征图经过1个1×1的卷积层,得到的特征图De_1,将特征图De_1上采样至28×28,得到的特征图经过2个3×3的卷积、批归一化层与线性整流单元,最后与特征图Res_3串联,得到的特征图De_2;将De_1De_2上采样至56×56,再经过2个3×3的卷积、批归一化层与线性整流单元,与特征图Res_2串联,得到的特征图De_3,将特征图De_3上采样至112×112,再经过2个3×3卷积、批归一化层与线性整流单元,得到的特征图De_4,最后将特征图De_4经过1个变权全局池化,最后上采样至原图大小,并与语义分割标注计算交叉熵,利用误差方向传播,得到语义分割的网络模型。变权全局池化,由于传统的全局平局池化操作,对所有特征通道的同一位置都执行相同操作,即1×1卷积,不能突出语义分割中的每个像素点的正确分类类别,给全局平均池化中的1×1卷积加上1个权值向量,通过标准高斯分布进行参数初始化,训练过程中,根据反向传播,对属于目标类别的像素赋以高权值,能够更好的进行逐像素分类,还能起到加快收敛的作用。本发明在VOC2012语义分割数据集上取得了mIoU为76.33%的结果。The principle and process of the present invention are as follows: the feature map Res_1 output by Conv1, the feature map Res_2 output by Conv2_x, the feature map Res_3 output by Conv3_x, and the feature map Res_4 output by Conv4_x are used in the image semantic segmentation model of the present invention, which are respectively the front-end network. (i.e. the first, second, third and fourth layers of the feature extraction network). Then the feature map Res_1 is downsampled by the detail preservation pooling layer to preserve the details convolution by 8 times, Res_2 is downsampled by 4 times by the detail preservation pooling layer, and Res_3 is preserved by the detail preservation pooling layer. The convolution downsampling is 2 times and Res_1 is connected in series, and input to the feature recalibration module. After channel feature recalibration, spatial feature recalibration can better recalibrate the importance of all pixels at the same position in the space, and assign corresponding To improve the accuracy of semantic segmentation, channel feature re-correction can assign high weights to important channels to highlight their importance. Then, the feature map output by the feature recalibration module is passed through a 1×1 convolution layer to obtain the feature map De_1. The feature map De_1 is upsampled to 28×28, and the obtained feature map is passed through two 3×3 convolutions Product, batch normalization layer and linear rectification unit, and finally concatenate with feature map Res_3 to obtain feature map De_2; upsample De_1De_2 to 56×56, and then go through two 3×3 convolution and batch normalization layers It is connected with the linear rectification unit and the feature map Res_2 in series to obtain the feature map De_3. The feature map De_3 is upsampled to 112×112, and then after two 3×3 convolutions, batch normalization layers and linear rectification units, the obtained Feature map De_4, finally the feature map De_4 is globally pooled by a variable weight, and finally upsampled to the size of the original image, and cross-entropy is calculated with the semantic segmentation annotation, and the error direction propagation is used to obtain the semantic segmentation network model. Variable weight global pooling, due to the traditional global draw pooling operation, performs the same operation on the same position of all feature channels, that is, 1×1 convolution, which cannot highlight the correct classification category of each pixel in semantic segmentation, giving The 1×1 convolution in the global average pooling adds a weight vector, and the parameters are initialized through the standard Gaussian distribution. During the training process, according to backpropagation, the pixels belonging to the target category are assigned high weights, which can be more effective. A good pixel-by-pixel classification can also play a role in speeding up the convergence. The present invention achieves an mIoU of 76.33% on the VOC2012 semantic segmentation dataset.
凡是根据本发明的技术方案做出的技术变形,均落入本发明的保护范围之内。All technical deformations made according to the technical solutions of the present invention fall within the protection scope of the present invention.
Claims (6)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810947884.XA CN109101975B (en) | 2018-08-20 | 2018-08-20 | Image semantic segmentation method based on full convolution neural network |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810947884.XA CN109101975B (en) | 2018-08-20 | 2018-08-20 | Image semantic segmentation method based on full convolution neural network |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN109101975A CN109101975A (en) | 2018-12-28 |
| CN109101975B true CN109101975B (en) | 2022-01-25 |
Family
ID=64850450
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201810947884.XA Expired - Fee Related CN109101975B (en) | 2018-08-20 | 2018-08-20 | Image semantic segmentation method based on full convolution neural network |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN109101975B (en) |
Families Citing this family (33)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109886082A (en) * | 2019-01-03 | 2019-06-14 | 南京理工大学 | A detection method of small target enhanced prediction module based on SSD |
| CN109886273B (en) * | 2019-02-26 | 2022-12-16 | 四川大学华西医院 | A CMR Image Segmentation and Classification System |
| CN109919080B (en) * | 2019-03-05 | 2019-10-11 | 南京航空航天大学 | Multi-decoder fully convolutional neural network and its corresponding mesostructure recognition method |
| CN110225342B (en) * | 2019-04-10 | 2021-03-09 | 中国科学技术大学 | Bit allocation system and method for video coding based on semantic distortion metrics |
| CN110084319B (en) * | 2019-05-07 | 2023-06-30 | 上海宝尊电子商务有限公司 | Fashion image clothing collar type recognition method and system based on deep neural network |
| CN110147763B (en) * | 2019-05-20 | 2023-02-24 | 哈尔滨工业大学 | Video semantic segmentation method based on convolutional neural network |
| CN110148142B (en) * | 2019-05-27 | 2023-04-18 | 腾讯科技(深圳)有限公司 | Training method, device and equipment of image segmentation model and storage medium |
| CN110189334B (en) * | 2019-05-28 | 2022-08-09 | 南京邮电大学 | Medical image segmentation method of residual error type full convolution neural network based on attention mechanism |
| CN110276267A (en) * | 2019-05-28 | 2019-09-24 | 江苏金海星导航科技有限公司 | Method for detecting lane lines based on Spatial-LargeFOV deep learning network |
| CN110232693B (en) * | 2019-06-12 | 2022-12-09 | 桂林电子科技大学 | An Image Segmentation Method Combining Heat Map Channel and Improved U-Net |
| CN110263706B (en) * | 2019-06-19 | 2021-07-27 | 南京邮电大学 | A method for dynamic target detection and recognition in vehicle-mounted video in haze weather |
| CN110363134B (en) * | 2019-07-10 | 2021-06-08 | 电子科技大学 | Human face shielding area positioning method based on semantic segmentation |
| CN110428009B (en) * | 2019-08-02 | 2020-06-16 | 南京航空航天大学 | Full convolution neural network and corresponding mesoscopic structure identification method |
| CN110517272B (en) * | 2019-08-29 | 2022-03-25 | 电子科技大学 | Deep learning-based blood cell segmentation method |
| CN110689020A (en) * | 2019-10-10 | 2020-01-14 | 湖南师范大学 | Segmentation method of mineral flotation froth image and electronic equipment |
| CN110930409B (en) * | 2019-10-18 | 2022-10-14 | 电子科技大学 | Salt body semantic segmentation method and semantic segmentation system based on deep learning |
| CN110782023B (en) * | 2019-11-04 | 2023-04-07 | 华南理工大学 | Reduction residual module porous convolution architecture network and rapid semantic segmentation method |
| CN111091550A (en) * | 2019-12-12 | 2020-05-01 | 创新奇智(北京)科技有限公司 | Multi-size self-adaptive PCB solder paste area detection system and detection method |
| CN111192248B (en) * | 2019-12-30 | 2023-05-05 | 山东大学 | A multi-task relational learning method for vertebral body localization, recognition and segmentation in magnetic resonance imaging |
| CN111209972A (en) * | 2020-01-09 | 2020-05-29 | 中国科学院计算技术研究所 | Image classification method and system based on hybrid connectivity deep convolutional neural network |
| CN111275712B (en) * | 2020-01-15 | 2022-03-01 | 浙江工业大学 | Residual semantic network training method oriented to large-scale image data |
| CN111259906B (en) * | 2020-01-17 | 2023-04-07 | 陕西师范大学 | Method for generating remote sensing image target segmentation countermeasures under condition containing multilevel channel attention |
| CN111489364B (en) * | 2020-04-08 | 2022-05-03 | 重庆邮电大学 | Medical image segmentation method based on lightweight full convolution neural network |
| CN111612740B (en) * | 2020-04-16 | 2023-07-25 | 深圳大学 | Method and device for pathological image processing |
| CN111862190B (en) * | 2020-07-10 | 2024-04-05 | 北京农业生物技术研究中心 | Method and device for automatically measuring area of soft rot disease spots of isolated plants |
| CN111860386B (en) * | 2020-07-27 | 2022-04-08 | 山东大学 | A Video Semantic Segmentation Method Based on ConvLSTM Convolutional Neural Network |
| CN112001916B (en) * | 2020-09-02 | 2024-08-02 | 苏州三仲信息科技有限公司 | Method for training neural network and obtaining mounting hole positions and corresponding device |
| CN112364878A (en) * | 2020-09-25 | 2021-02-12 | 江苏师范大学 | Power line classification method based on deep learning under complex background |
| CN112766279B (en) * | 2020-12-31 | 2023-04-07 | 中国船舶重工集团公司第七0九研究所 | Image feature extraction method based on combined attention mechanism |
| CN113487524B (en) * | 2021-04-07 | 2023-05-12 | 北京百度网讯科技有限公司 | Image format conversion method, apparatus, device, storage medium, and program product |
| CN114255255B (en) * | 2021-11-16 | 2025-01-07 | 中国航空工业集团公司雷华电子技术研究所 | A real-time coastline extraction method |
| CN114529877B (en) * | 2022-01-24 | 2024-09-06 | 华南理工大学 | Pavement weather identification method based on pavement semantic segmentation and convolutional neural network |
| CN114677510B (en) * | 2022-03-22 | 2025-04-15 | 中南大学 | Feature map upsampling method, small target semantic segmentation method and imaging method |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107480726A (en) * | 2017-08-25 | 2017-12-15 | 电子科技大学 | A kind of Scene Semantics dividing method based on full convolution and shot and long term mnemon |
| CN107784654A (en) * | 2016-08-26 | 2018-03-09 | 杭州海康威视数字技术股份有限公司 | Image partition method, device and full convolutional network system |
| CN108062756A (en) * | 2018-01-29 | 2018-05-22 | 重庆理工大学 | Image, semantic dividing method based on the full convolutional network of depth and condition random field |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10303979B2 (en) * | 2016-11-16 | 2019-05-28 | Phenomic Ai Inc. | System and method for classifying and segmenting microscopy images with deep multiple instance learning |
-
2018
- 2018-08-20 CN CN201810947884.XA patent/CN109101975B/en not_active Expired - Fee Related
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107784654A (en) * | 2016-08-26 | 2018-03-09 | 杭州海康威视数字技术股份有限公司 | Image partition method, device and full convolutional network system |
| CN107480726A (en) * | 2017-08-25 | 2017-12-15 | 电子科技大学 | A kind of Scene Semantics dividing method based on full convolution and shot and long term mnemon |
| CN108062756A (en) * | 2018-01-29 | 2018-05-22 | 重庆理工大学 | Image, semantic dividing method based on the full convolutional network of depth and condition random field |
Non-Patent Citations (4)
| Title |
|---|
| Attention U-Net:Learning Where to Look for the Pancreas;Ozan Oktay等;《arXiv:1804.03999v1 [cs.CV]》;20180411;全文 * |
| Concurrent Spatial and Channel ‘Squeeze & Excitation" in Fully Convolutional Networks;Abhijit Guha Roy等;《arXiv:1803.02579v2 [cs.CV]》;20180608;全文 * |
| Detail-Preserving Pooling in Deep Networks;Faraz Saeedan等;《arXiv:1804.04076v1 [》;20180411;全文 * |
| 全卷积网络多层特征融合的飞机快速检测;辛鹏等;《光学学报》;20180331;第38卷(第3期);全文 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN109101975A (en) | 2018-12-28 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109101975B (en) | Image semantic segmentation method based on full convolution neural network | |
| CN109241972B (en) | Image Semantic Segmentation Method Based on Deep Learning | |
| CN109360171B (en) | A real-time deblurring method of video images based on neural network | |
| CN112750082A (en) | Face super-resolution method and system based on fusion attention mechanism | |
| CN111369442B (en) | Remote sensing image super-resolution reconstruction method based on fuzzy kernel classification and attention mechanism | |
| CN110633661A (en) | A remote sensing image object detection method fused with semantic segmentation | |
| Zeng et al. | Single image super-resolution using a polymorphic parallel CNN | |
| CN110309835B (en) | A method and device for extracting local features of an image | |
| CN108509978A (en) | The multi-class targets detection method and model of multi-stage characteristics fusion based on CNN | |
| CN112396645A (en) | Monocular image depth estimation method and system based on convolution residual learning | |
| CN113177882A (en) | Single-frame image super-resolution processing method based on diffusion model | |
| CN110517272B (en) | Deep learning-based blood cell segmentation method | |
| WO2023206343A1 (en) | Image super-resolution method based on image pre-training strategy | |
| CN113298097B (en) | Feature point extraction method and device based on convolutional neural network and storage medium | |
| CN113284059A (en) | Model training method, image enhancement method, device, electronic device and medium | |
| CN116188272B (en) | Two-stage depth network image super-resolution reconstruction method suitable for multiple fuzzy cores | |
| CN113449612B (en) | Three-dimensional target point cloud identification method based on sub-flow sparse convolution | |
| CN116703725A (en) | Method for realizing super resolution for real world text image by double branch network for sensing multiple characteristics | |
| CN114862679A (en) | Single image super-resolution reconstruction method based on residual generative adversarial network | |
| CN112200752B (en) | A multi-frame image deblurring system based on ER network and its method | |
| CN105590296B (en) | A kind of single-frame images Super-Resolution method based on doubledictionary study | |
| CN113096032A (en) | Non-uniform blur removing method based on image area division | |
| CN111382845B (en) | Template reconstruction method based on self-attention mechanism | |
| CN117036884A (en) | Remote sensing image space-time fusion method based on self-adaptive normalization and attention mechanism | |
| Lee et al. | SAF-Nets: Shape-adaptive filter networks for 3D point cloud processing |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20220125 |
|
| CF01 | Termination of patent right due to non-payment of annual fee |