CN111950551A

CN111950551A - A target detection method based on convolutional neural network

Info

Publication number: CN111950551A
Application number: CN202010816397.7A
Authority: CN
Inventors: 李松江; 吴宁; 王鹏
Original assignee: Changchun University of Science and Technology
Current assignee: Changchun University of Science and Technology
Priority date: 2020-08-14
Filing date: 2020-08-14
Publication date: 2020-11-17
Anticipated expiration: 2040-08-14
Also published as: CN111950551B

Abstract

The invention relates to a target detection method based on a convolutional neural network, comprising: performing feature extraction based on a residual convolutional neural network to obtain a layer-by-layer basic feature map; and sequentially fusing the basic feature maps from shallow to deep to obtain fusion features Figure; perform candidate frame extraction on the fusion feature map based on the region generation network to obtain a candidate target region feature map; obtain a region of interest feature map according to the fusion feature map and the candidate target region feature map; The region feature maps are based on fully convolutional layers to obtain classification scores and bounding box regression. The present invention has higher detection accuracy for small targets and occluded targets.

Description

A target detection method based on convolutional neural network

技术领域technical field

本发明涉及图像信息处理技术领域，特别是涉及一种基于卷积神经网络的目标检测方法。The invention relates to the technical field of image information processing, in particular to a target detection method based on a convolutional neural network.

背景技术Background technique

随着道路交通压力的日益增大，通过计算机技术对道路车辆的智能化管控已成为研究热门；利用道路监控设备对车辆目标进行检测，掌握路网的车辆数据及行车轨迹是优化交通、缓解交通压力的前提，同时车辆目标检测是无人驾驶、车辆跟踪、车辆特征识别领域的研究基础。With the increasing pressure of road traffic, the intelligent management and control of road vehicles through computer technology has become a research hotspot; the use of road monitoring equipment to detect vehicle targets, and mastering the vehicle data and driving trajectories of the road network is the key to optimizing traffic and alleviating traffic. The premise of pressure, and vehicle target detection is the research basis in the fields of unmanned driving, vehicle tracking, and vehicle feature recognition.

目前，卷积神经网络被广泛应用于车辆目标检测领域，常用的一般分为单阶段检测算法和双阶段检测算法，单阶段检测算法是一种基于回归的目标检测算法，双阶段检测算法首先生成候选区域，然后进行分类和细化。由于算法结构的差异，双阶段检测算法有更高的检测精度，但检测速度低于单阶段检测算法，适用于对检测精度要求较高的场景。At present, convolutional neural networks are widely used in the field of vehicle target detection. The commonly used ones are generally divided into single-stage detection algorithms and two-stage detection algorithms. The single-stage detection algorithm is a regression-based target detection algorithm. The two-stage detection algorithm first generates The candidate regions are then classified and refined. Due to the difference in algorithm structure, the two-stage detection algorithm has higher detection accuracy, but the detection speed is lower than that of the single-stage detection algorithm, which is suitable for scenarios that require higher detection accuracy.

现有的双阶段目标检测算法存在以下问题：由于遮挡目标及小目标的特征较少，现有的算法对于浅层位置信息及上下文信息利用的不充分，使得小目标及遮挡目标的检测精度较低。The existing two-stage target detection algorithms have the following problems: due to the few features of occluded targets and small targets, the existing algorithms do not fully utilize the shallow position information and context information, which makes the detection accuracy of small targets and occluded targets relatively low. Low.

发明内容SUMMARY OF THE INVENTION

本发明的目的是提供一种针对小目标及遮挡目标具有较高检测精度的基于卷积神经网络的目标检测方法。The purpose of the present invention is to provide a target detection method based on a convolutional neural network with high detection accuracy for small targets and occluded targets.

为实现上述目的，本发明提供了如下方案：For achieving the above object, the present invention provides the following scheme:

一种基于卷积神经网络的目标检测方法，包括：A target detection method based on convolutional neural network, including:

基于残差卷神经网络进行特征提取，得到逐层的基础特征图；Feature extraction based on residual volume neural network to obtain basic feature map layer by layer;

将所述基础特征图由浅至深依次融合，得到融合特征图；The basic feature maps are fused sequentially from shallow to deep to obtain a fusion feature map;

基于区域生成网络对所述融合特征图进行候选框提取，得到候选目标区域特征图；Extracting candidate frames from the fusion feature map based on the region generation network to obtain a candidate target region feature map;

根据所述融合特征图和所述候选目标区域特征图得到感兴趣区域特征图；Obtain a region of interest feature map according to the fusion feature map and the candidate target region feature map;

根据所述感兴趣区域特征图基于全卷积层得到分类得分和边框回归。The classification score and bounding box regression are obtained based on the fully convolutional layer according to the region of interest feature map.

优选地，所述基础特征图包括第一特征图、第二特征图、第三特征图和第四特征图。Preferably, the basic feature map includes a first feature map, a second feature map, a third feature map and a fourth feature map.

优选地，所述将所述基础特征图由浅至深依次融合，得到融合特征图，包括：Preferably, the basic feature maps are fused sequentially from shallow to deep to obtain a fusion feature map, including:

对所述第一特征图进行下采样处理，得到下采样特征图；Perform downsampling processing on the first feature map to obtain a downsampling feature map;

对所述第二特征图进行卷积降维处理，得到降维特征图，所述降维特征图的通道数与所述下采样特征图的通道数相同；Performing convolution dimension reduction processing on the second feature map to obtain a dimension reduction feature map, where the number of channels of the dimension reduction feature map is the same as the number of channels of the downsampling feature map;

将所述下采样特征图与所述降维特征图进行融合得到初始融合特征图；同理最终得到所述融合特征图。The down-sampling feature map and the dimension reduction feature map are fused to obtain an initial fused feature map; similarly, the fused feature map is finally obtained.

优选地，所述对所述第一特征图进行下采样处理，得到下采样特征图，包括：Preferably, performing down-sampling processing on the first feature map to obtain a down-sampling feature map, including:

基于n个支路空洞卷积分别对所述第一特征图进行下采样处理；n为大于1的正整数；Perform down-sampling processing on the first feature map based on n branch hole convolutions; n is a positive integer greater than 1;

将经过各支路空洞卷积进行下采样处理的所述第一特征图进行融合得到所述下采样特征图。The down-sampling feature map is obtained by fusing the first feature map subjected to down-sampling processing through hole convolution of each branch.

优选地，所述n为3，3个支路的空洞率分别为1、2和3。Preferably, the n is 3, and the void ratios of the three branches are 1, 2, and 3, respectively.

优选地，所述基于区域生成网络对所述融合特征图进行候选框提取，得到候选目标区域特征图，包括：Preferably, the region-based generation network performs candidate frame extraction on the fusion feature map to obtain a candidate target region feature map, including:

基于第一设定卷积核对所述融合特征图进行卷积处理，得到第一卷积特征图；Perform convolution processing on the fusion feature map based on the first set convolution kernel to obtain a first convolution feature map;

基于第二设定卷积核对所述第一卷积特征图进行卷积处理，得到第二卷积特征图；Perform convolution processing on the first convolution feature map based on the second set convolution kernel to obtain a second convolution feature map;

基于第二设定卷积核对所述第二卷积特征图进行卷积处理，得到第三卷积特征图；Perform convolution processing on the second convolution feature map based on the second set convolution kernel to obtain a third convolution feature map;

将所述第二卷积特征图和所述第三卷积特征图分别输入两个并行的全连接层，基于设定锚框进行处理，得到所述候选目标区域特征图。The second convolutional feature map and the third convolutional feature map are respectively input into two parallel fully-connected layers, and processed based on the set anchor frame to obtain the candidate target region feature map.

优选地，所述根据所述感兴趣区域特征图基于全卷积层得到分类得分和边框回归，包括：Preferably, the classification score and the bounding box regression are obtained based on the fully convolutional layer according to the feature map of the region of interest, including:

根据所述感兴趣区域特征图基于全卷积层得到初始分类得分和初始边框回归；Obtaining the initial classification score and initial frame regression based on the fully convolutional layer according to the region of interest feature map;

用所述初始边框回归替换所述设定锚框，并依次执行后续步骤，通过设定m个阈值，并重复执行m次此过程，得到所述分类得分和所述边框回归；m为大于或等于1的正整数。Replace the set anchor frame with the initial frame regression, and perform subsequent steps in turn, by setting m thresholds, and repeating this process m times to obtain the classification score and the frame regression; m is greater than or A positive integer equal to 1.

优选地，所述第一设定卷积核为3×3；所述第二设定卷积核为1×1。Preferably, the first set convolution kernel is 3×3; the second set convolution kernel is 1×1.

优选地，所述根据所述融合特征图和所述候选目标区域特征图得到感兴趣区域特征图，包括：Preferably, obtaining the region of interest feature map according to the fusion feature map and the candidate target region feature map includes:

基于ROIAlign对所述融合特征图和所述候选目标区域特征图进行融合得到初始感兴趣区域特征图；Based on ROIAlign, the fusion feature map and the candidate target region feature map are fused to obtain the initial region of interest feature map;

按照设定倍数对所述初始感兴趣区域特征图进行放大处理得到放大感兴趣区域特征图；Enlarging the initial region-of-interest feature map according to a set multiple to obtain an enlarged region-of-interest feature map;

基于所述放大感兴趣区域特征图对所述初始感兴趣区域特征图进行全局上下文提取，得到上下文信息；Perform global context extraction on the initial region of interest feature map based on the enlarged region of interest feature map to obtain context information;

基于ROIAlign对初始感兴趣区域特征图与所述上下文信息进行融合得到所述感兴趣区域特征图。The initial region of interest feature map is fused with the context information based on ROIAlign to obtain the region of interest feature map.

优选地，所述残差卷神经网络为ResNet-101网络。Preferably, the residual volume neural network is a ResNet-101 network.

根据本发明提供的具体实施例，本发明公开了以下技术效果：According to the specific embodiments provided by the present invention, the present invention discloses the following technical effects:

本发明涉及一种基于卷积神经网络的目标检测方法，包括：基于残差卷神经网络进行特征提取，得到逐层的基础特征图；将所述基础特征图由浅至深依次融合，得到融合特征图；基于区域生成网络对所述融合特征图进行候选框提取，得到候选目标区域特征图；根据所述融合特征图和所述候选目标区域特征图得到感兴趣区域特征图；根据所述感兴趣区域特征图基于全卷积层得到分类得分和边框回归。本发明针对小目标及遮挡目标具有较高检测精度。The invention relates to a target detection method based on a convolutional neural network, comprising: performing feature extraction based on a residual convolutional neural network to obtain a layer-by-layer basic feature map; and sequentially fusing the basic feature maps from shallow to deep to obtain fusion features Figure; perform candidate frame extraction on the fusion feature map based on the region generation network to obtain a candidate target region feature map; obtain a region of interest feature map according to the fusion feature map and the candidate target region feature map; The region feature maps are based on fully convolutional layers to obtain classification scores and bounding box regression. The present invention has higher detection accuracy for small targets and occluded targets.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the accompanying drawings required in the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some of the present invention. In the embodiments, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative labor.

图1为本发明基于卷积神经网络的目标检测方法流程图。FIG. 1 is a flowchart of a target detection method based on a convolutional neural network according to the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

为使本发明的上述目的、特征和优点能够更加明显易懂，下面结合附图和具体实施方式对本发明作进一步详细的说明。In order to make the above objects, features and advantages of the present invention more clearly understood, the present invention will be described in further detail below with reference to the accompanying drawings and specific embodiments.

图1为本发明基于卷积神经网络的目标检测方法流程图，如图1所示，本发明提供了一种基于卷积神经网络的目标检测方法，包括：Fig. 1 is the flow chart of the target detection method based on the convolutional neural network of the present invention. As shown in Fig. 1, the present invention provides a target detection method based on the convolutional neural network, including:

步骤S1，基于残差卷神经网络ResNet-101进行特征提取，得到逐层的基础特征图；具体包括第一特征图、第二特征图、第三特征图和第四特征图。本实施例中，所述ResNet-101的各卷积层具体情况如表1。In step S1, feature extraction is performed based on the residual volume neural network ResNet-101 to obtain a layer-by-layer basic feature map, which specifically includes a first feature map, a second feature map, a third feature map, and a fourth feature map. In this embodiment, the details of each convolutional layer of the ResNet-101 are shown in Table 1.

表1、ResNet-101各个卷积层Table 1. Each convolutional layer of ResNet-101

其中，w为感兴趣区域的宽度，h为感兴趣区域的高度。where w is the width of the region of interest and h is the height of the region of interest.

步骤S2，将所述基础特征图由浅至深依次融合，得到融合特征图。Step S2, the basic feature maps are fused sequentially from shallow to deep to obtain a fused feature map.

以所述第一特征图和所述第二特征图进行融合为例，进行说明，具体过程如下：Taking the fusion of the first feature map and the second feature map as an example to illustrate, the specific process is as follows:

基于n个支路空洞卷积分别对所述第一特征图进行下采样处理；n为大于1的正整数。本实施例中，n取3，卷积核大小为3×3，卷积步长为2；3个支路的空洞率分别为1、2和3。The first feature map is down-sampled based on n branch hole convolutions; n is a positive integer greater than 1. In this embodiment, n is set to 3, the size of the convolution kernel is 3×3, and the convolution step size is 2;

将经过各支路空洞卷积进行下采样处理的所述第一特征图进行融合得到所述下采样特征图。具体计算公式为：The down-sampling feature map is obtained by fusing the first feature map subjected to down-sampling processing through hole convolution of each branch. The specific calculation formula is:

F＝H_3,1(x)+H_3,2(x)+H_3,3(x)；F=H _3,1(x) +H _3,2(x) +H _3,3(x) ;

式中：F表示融合后的下采样特征图，H_k,r,(x)表示空洞卷积，k表示卷积核大小，r表示空洞率，x为第一特征图。In the formula: F represents the fused down-sampling feature map, H _{k, r, (x)} represents the hole convolution, k represents the size of the convolution kernel, r represents the hole rate, and x is the first feature map.

对所述第二特征图采用1×1的卷积核进行卷积降维处理，得到降维特征图，所述降维特征图的通道数与所述下采样特征图的通道数相同。A 1×1 convolution kernel is used to perform convolution dimension reduction processing on the second feature map to obtain a dimension-reduced feature map, where the number of channels of the dimension-reduced feature map is the same as the number of channels of the down-sampling feature map.

将所述下采样特征图与所述降维特征图进行融合得到初始融合特征图。The down-sampled feature map and the dimension-reduced feature map are fused to obtain an initial fusion feature map.

根据上述步骤依次进行融合得到所述融合特征图。The fusion feature map is obtained by performing fusion in sequence according to the above steps.

步骤S3，基于区域生成网络对所述融合特征图进行候选框提取，得到候选目标区域特征图。Step S3, extracting candidate frames from the fusion feature map based on the region generation network to obtain a feature map of candidate target regions.

作为一种可选的实施方式，本发明所述步骤S3包括：As an optional implementation manner, the step S3 of the present invention includes:

步骤S31，基于第一设定卷积核对所述融合特征图进行卷积处理，得到第一卷积特征图。本实施例中，所述第一设定卷积核大小为3×3。Step S31 , performing convolution processing on the fusion feature map based on the first set convolution kernel to obtain a first convolution feature map. In this embodiment, the size of the first set convolution kernel is 3×3.

步骤S32，基于第二设定卷积核对所述第一卷积特征图进行卷积处理，得到第二卷积特征图。本实施例中，所述第二设定卷积核大小为1×1。Step S32, performing convolution processing on the first convolution feature map based on the second set convolution kernel to obtain a second convolution feature map. In this embodiment, the size of the second set convolution kernel is 1×1.

步骤S33，基于第二设定卷积核对所述第二卷积特征图进行卷积处理，得到第三卷积特征图。Step S33, performing convolution processing on the second convolution feature map based on the second set convolution kernel to obtain a third convolution feature map.

步骤S34，将所述第二卷积特征图和所述第三卷积特征图分别输入两个并行的全连接层，基于设定锚框进行处理，得到所述候选目标区域特征图。Step S34, the second convolution feature map and the third convolution feature map are respectively input into two parallel fully connected layers, and processed based on the set anchor frame to obtain the candidate target region feature map.

步骤S4，根据所述融合特征图和所述候选目标区域特征图得到感兴趣区域特征图。Step S4, obtaining a region of interest feature map according to the fusion feature map and the candidate target region feature map.

具体地，所述步骤S4包括：Specifically, the step S4 includes:

步骤S41，基于ROI Align对所述融合特征图和所述候选目标区域特征图进行融合得到初始感兴趣区域特征图。Step S41 , fuse the fusion feature map and the candidate target region feature map based on the ROI Align to obtain an initial region of interest feature map.

步骤S42，按照设定倍数对所述初始感兴趣区域特征图进行放大处理得到放大感兴趣区域特征图。本实施例中，所述设定倍数为1.5。Step S42: Enlarging the initial region-of-interest feature map according to a set multiple to obtain an enlarged region-of-interest feature map. In this embodiment, the set multiple is 1.5.

步骤S43，基于所述放大感兴趣区域特征图对所述初始感兴趣区域特征图进行上下左右四个方向的全局上下文提取，得到上下文信息。Step S43 , perform global context extraction in four directions of up, down, left, and right on the initial region of interest feature map based on the enlarged region of interest feature map to obtain context information.

步骤S44，基于ROIAlign将所述初始感兴趣区域特征图与所述上下文信息映射成为相同大小的矩形框，并进行进行融合得到所述感兴趣区域特征图。Step S44, based on ROIAlign, map the initial region of interest feature map and the context information into a rectangular frame of the same size, and perform fusion to obtain the region of interest feature map.

步骤S5，根据所述感兴趣区域特征图基于全卷积层得到分类得分和边框回归。Step S5, obtaining classification score and bounding box regression based on the fully convolutional layer according to the feature map of the region of interest.

具体地，根据所述感兴趣区域特征图基于全卷积层得到初始分类得分和初始边框回归。Specifically, the initial classification score and the initial bounding box regression are obtained based on the fully convolutional layer according to the feature map of the region of interest.

用所述初始边框回归替换所述设定锚框，并依次执行之后的步骤，通过设定m个阈值，并重复执行m次此过程，得到所述分类得分和所述边框回归；m为大于或等于1的正整数。本实施例中，m取3，三个阈值分别为0.5、0.6和0.7。Replace the set anchor frame with the initial frame regression, and perform the following steps in turn, by setting m thresholds and repeating this process m times to obtain the classification score and the frame regression; m is greater than or a positive integer equal to 1. In this embodiment, m is taken as 3, and the three thresholds are respectively 0.5, 0.6 and 0.7.

本说明书中各个实施例采用递进的方式描述，每个实施例重点说明的都是与其他实施例的不同之处，各个实施例之间相同相似部分互相参见即可。The various embodiments in this specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments, and the same and similar parts between the various embodiments can be referred to each other.

本文中应用了具体个例对本发明的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本发明的方法及其核心思想；同时，对于本领域的一般技术人员，依据本发明的思想，在具体实施方式及应用范围上均会有改变之处。综上所述，本说明书内容不应理解为对本发明的限制。In this paper, specific examples are used to illustrate the principles and implementations of the present invention. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the present invention; meanwhile, for those skilled in the art, according to the present invention There will be changes in the specific implementation and application scope. In conclusion, the contents of this specification should not be construed as limiting the present invention.

Claims

1. a target detection method based on convolutional neural network, is characterized in that, comprises:

Feature extraction based on residual volume neural network to obtain basic feature map layer by layer;

The basic feature maps are fused sequentially from shallow to deep to obtain a fusion feature map;

Extracting candidate frames from the fusion feature map based on the region generation network to obtain a candidate target region feature map;

Obtain a region of interest feature map according to the fusion feature map and the candidate target region feature map;

The classification score and bounding box regression are obtained based on the fully convolutional layer according to the region of interest feature map.

2 . The target detection method based on a convolutional neural network according to claim 1 , wherein the basic feature map comprises a first feature map, a second feature map, a third feature map and a fourth feature map. 3 . .

3. a kind of target detection method based on convolutional neural network according to claim 2, is characterized in that, described basic feature map is fused successively from shallow to deep, obtains fusion feature map, comprising:

Perform downsampling processing on the first feature map to obtain a downsampling feature map;

Performing convolution dimension reduction processing on the second feature map to obtain a dimension reduction feature map, where the number of channels of the dimension reduction feature map is the same as the number of channels of the downsampling feature map;

The down-sampling feature map and the dimension reduction feature map are fused to obtain an initial fused feature map; similarly, the fused feature map is finally obtained.

4. The target detection method based on a convolutional neural network according to claim 3, wherein the downsampling process is performed on the first feature map to obtain a downsampling feature map, comprising:

Perform down-sampling processing on the first feature map based on n branch hole convolutions; n is a positive integer greater than 1;

The down-sampling feature map is obtained by fusing the first feature map subjected to down-sampling processing through hole convolution of each branch.

5 . The target detection method based on a convolutional neural network according to claim 4 , wherein the n is 3, and the void ratios of the three branches are 1, 2, and 3, respectively. 6 .

6. A target detection method based on a convolutional neural network according to claim 1, wherein the region-based generation network performs candidate frame extraction on the fusion feature map to obtain a candidate target region feature map, comprising: :

Perform convolution processing on the fusion feature map based on the first set convolution kernel to obtain a first convolution feature map;

Perform convolution processing on the first convolution feature map based on the second set convolution kernel to obtain a second convolution feature map;

Perform convolution processing on the second convolution feature map based on the second set convolution kernel to obtain a third convolution feature map;

The second convolutional feature map and the third convolutional feature map are respectively input into two parallel fully-connected layers, and processed based on the set anchor frame to obtain the candidate target region feature map.

7. a kind of target detection method based on convolutional neural network according to claim 6, is characterized in that, described according to described region of interest feature map to obtain classification score and bounding box regression based on full convolution layer, comprising:

Obtaining the initial classification score and initial frame regression based on the fully convolutional layer according to the region of interest feature map;

Replace the set anchor frame with the initial frame regression, and perform subsequent steps in sequence, by setting m thresholds, and repeating this process m times to obtain the classification score and the frame regression; m is greater than or A positive integer equal to 1.

8 . The target detection method based on a convolutional neural network according to claim 6 , wherein the first set convolution kernel is 3×3; the second set convolution kernel is 1 8 . ×1.

9. A target detection method based on a convolutional neural network according to claim 1, wherein the obtaining a region of interest feature map according to the fusion feature map and the candidate target region feature map, comprising:

Based on ROI Align, the fusion feature map and the candidate target region feature map are fused to obtain an initial region of interest feature map;

Enlarging the initial region-of-interest feature map according to a set multiple to obtain an enlarged region-of-interest feature map;

Perform global context extraction on the initial region of interest feature map based on the enlarged region of interest feature map to obtain context information;

Based on the ROI Align, the initial region of interest feature map and the context information are fused to obtain the region of interest feature map.

10 . The target detection method based on a convolutional neural network according to claim 1 , wherein the residual convolutional neural network is a ResNet-101 network. 11 .