CN118072163A

CN118072163A - Neural network-based method and system for detecting illegal occupation of territorial cultivated land

Info

Publication number: CN118072163A
Application number: CN202410153262.5A
Authority: CN
Inventors: 李作进; 贺学乐; 曹亚男; 陈刘奎; 卿晓东; 吴昭; 青美伊; 陈清
Original assignee: Chongqing University of Science and Technology
Current assignee: Chongqing University of Science and Technology
Priority date: 2024-02-02
Filing date: 2024-02-02
Publication date: 2024-05-24

Abstract

The invention relates to the technical field of computer vision, and particularly discloses a method and a system for detecting illegal occupation of a cultivated land in China with high precision based on a neural network. Firstly, the SSD is integrally changed into a backbone network by using the adjusted ResNet-50; secondly, an embedded feature fusion module (R50-FFM) fuses high and low layers, and a Shortcut module (additional feature layer) is added to generate a predicted feature pyramid; then, adding a Joint Weight Adjustment Module (JWAM) to enhance useful target information; finally, the aspect ratio of the candidate frames is optimized by adopting a K-means clustering algorithm, so that the candidate frames are more suitable for homemade data sets. Experimental results show that the improved SSD target detection neural network model can improve detection accuracy and effectively improve the detection effect of small targets.

Description

High-precision detection method and system for illegal occupation of cultivated land based on neural network

技术领域Technical Field

本发明涉及计算机视觉技术领域，尤其涉及基于神经网络的国土耕地违规占用高精度检测方法及系统。The present invention relates to the field of computer vision technology, and in particular to a high-precision detection method and system for illegal occupation of national arable land based on a neural network.

背景技术Background technique

早期，国土资源监测耗费大量人力物力，且时效性和准确性低下。后来，科技发展日新月异，衍生了更多更快捷、高效的技术。卫星遥感技术信息丰富且监测范围广，多是采用变化和图斑分析检测，但其图像分辨率不足使检测精度低、时效性差、易受轨道和云层环境限制；无人机航拍技术更适合耕地中小目标的监测，但要投入大量的人力物力，亦不能实时监测，对某地的长期监测上稳定性较差。In the early days, land and resources monitoring consumed a lot of manpower and material resources, and its timeliness and accuracy were low. Later, with the rapid development of science and technology, more and faster and more efficient technologies were derived. Satellite remote sensing technology is rich in information and has a wide monitoring range. It mostly uses change and pattern analysis detection, but its image resolution is insufficient, resulting in low detection accuracy, poor timeliness, and susceptibility to orbit and cloud environment restrictions; UAV aerial photography technology is more suitable for the monitoring of small and medium-sized targets in cultivated land, but it requires a lot of manpower and material resources, and cannot be monitored in real time, and has poor stability in long-term monitoring of a certain place.

发明内容Summary of the invention

本发明提供基于神经网络的国土耕地违规占用高精度检测方法及系统，解决的技术问题在于：如何实现对国土耕地违规占用的高精度实时智能检测。The present invention provides a high-precision detection method and system for illegal occupation of national arable land based on a neural network, and solves the technical problem of how to achieve high-precision real-time intelligent detection of illegal occupation of national arable land.

为解决以上技术问题，本发明提供基于神经网络的国土耕地违规占用高精度检测方法，包括步骤：In order to solve the above technical problems, the present invention provides a high-precision detection method for illegal occupation of national land and cultivated land based on a neural network, comprising the steps of:

对SSD目标检测神经网络模型进行改进，得到改进SSD目标检测神经网络模型；The SSD target detection neural network model is improved to obtain an improved SSD target detection neural network model;

采集国土耕地范围内的违规占用行为图片，并划分违规占用种类，构建数据集；Collect pictures of illegal occupation of cultivated land within the scope of the country, classify the types of illegal occupation, and build a data set;

采用所述数据集对所述国土耕地违规占用检测模型进行训练学习，得到国土耕地违规占用检测模型；The data set is used to train and learn the national land and cultivated land illegal occupation detection model to obtain the national land and cultivated land illegal occupation detection model;

采用所述国土耕地违规占用检测模型对国土耕地图像进行违规占用检测，得到检测结果。The national land and cultivated land illegal occupation detection model is used to perform illegal occupation detection on the national land and cultivated land image to obtain the detection result.

进一步地，对SSD目标检测神经网络模型进行改进，具体为：Furthermore, the SSD target detection neural network model is improved as follows:

将SSD目标检测神经网络模型的特征提取网络VGG-16替换为ResNet-50，并在ResNet-50中嵌入特征融合模块融合高低层信息；The feature extraction network VGG-16 of the SSD target detection neural network model is replaced with ResNet-50, and a feature fusion module is embedded in ResNet-50 to fuse high- and low-level information;

在ResNet-50后添加额外特征层生成预测特征金字塔；Adding an extra feature layer after ResNet-50 generates a prediction feature pyramid;

在预测特征金字塔前三层输出特征图前添加联合权值调整模块进行权值联合调整；A joint weight adjustment module is added before the output feature maps of the first three layers of the prediction feature pyramid to perform joint weight adjustment;

采用K-means聚类算法优化候选框的宽高比，使其适用于所述数据集。The K-means clustering algorithm is used to optimize the aspect ratio of the candidate box to make it suitable for the dataset.

进一步地，所述改进SSD目标检测神经网络模型包括特征提取模块、预测特征金字塔模块和预测特征模块；Furthermore, the improved SSD target detection neural network model includes a feature extraction module, a prediction feature pyramid module and a prediction feature module;

所述特征提取模块采用ResNet-50的前4层卷积层提取特征，并采用特征融合模块对该4层卷积层后3层的输出特征进行特征融合，得到融合特征图；The feature extraction module uses the first 4 convolutional layers of ResNet-50 to extract features, and uses the feature fusion module to fuse the output features of the last 3 layers of the 4 convolutional layers to obtain a fused feature map;

所述预测特征金字塔模块将所述融合特征图作为第一个预测特征层，在第一个预测特征层的后续每一个预测特征层上添加额外特征层，以生成其余5个预测特征层，得到6个预测特征层构成的预测特征金字塔；所述预测特征金字塔前三层的输出特征图前添加有联合权值调整模块进行权值联合调整；The prediction feature pyramid module uses the fused feature map as the first prediction feature layer, and adds an additional feature layer on each subsequent prediction feature layer of the first prediction feature layer to generate the remaining five prediction feature layers, thereby obtaining a prediction feature pyramid consisting of six prediction feature layers; a joint weight adjustment module is added before the output feature maps of the first three layers of the prediction feature pyramid to perform joint weight adjustment;

所述预测特征模块将所述预测特征金字塔的各特征图通过滑动窗口的方式生成一组候选框，并采用K-means聚类算法确定候选框的宽高比，使其适用于所述数据集，候选框的数量和比例根据需求设定；然后，对每个候选框分类，判定是否包含目标并根据类别分数确定其类别；然后，对含有目标的候选框进行边界框回归预测；最后，使用非极大值抑制算法剔除冗余框，获得最终预测目标的位置和所属类别。The prediction feature module generates a set of candidate boxes by sliding windows from each feature map of the prediction feature pyramid, and uses the K-means clustering algorithm to determine the aspect ratio of the candidate box so that it is suitable for the data set. The number and proportion of the candidate boxes are set according to requirements; then, each candidate box is classified to determine whether it contains a target and its category is determined according to the category score; then, bounding box regression prediction is performed on the candidate box containing the target; finally, the non-maximum suppression algorithm is used to eliminate redundant boxes to obtain the position and category of the final predicted target.

进一步地，所述特征融合模块将ResNet-50的第二、三、四层卷积层conv2_3、conv3_4、conv4_6输出的特征图进行1×1卷积及双线性插值操作以调整各特征图至同一尺度，然后进行通道叠加，得到融合特征；ResNet-50的第二、三、四层卷积层conv2_3、conv3_4、conv4_6分别由3、4、6个Bottleneck组成，每个Bottleneck由Conv1×1、3×3、1×1三层卷积连接而成，并通过恒等映射分支下采样和Conv1×1实现输入输出相加，每个Bottleneck的激活函数采用Leaky-Relu并添加BN归一化操作；ResNet-50的第一层卷积层后添加最大池化；Furthermore, the feature fusion module performs 1×1 convolution and bilinear interpolation operations on the feature maps output by the second, third and fourth convolutional layers conv2_3, conv3_4 and conv4_6 of ResNet-50 to adjust each feature map to the same scale, and then performs channel superposition to obtain fused features; the second, third and fourth convolutional layers conv2_3, conv3_4 and conv4_6 of ResNet-50 are composed of 3, 4 and 6 Bottlenecks respectively, each Bottleneck is connected by three layers of convolutions Conv1×1, 3×3 and 1×1, and the input and output are added by downsampling the identity mapping branch and Conv1×1, and the activation function of each Bottleneck adopts Leaky-Relu and adds BN normalization operation; the maximum pooling is added after the first convolutional layer of ResNet-50;

每个额外特征层采用一个与所述Bottleneck相同的结构。Each additional feature layer adopts a structure identical to the Bottleneck.

进一步地，所述联合权值调整模块对输入的特征图进行卷积后，得到特征图F；然后采用通道权值调整模块对特征图F各通道的占比关系做权值调整，得到特征图M_c(F)；然后将特征图M_c(F)与特征图F进行点乘得到特征图F'；然后采用位置权值调整模块对特征图F'各像素点的占比关系做权值调整，得到特征图M_s(F')；最后将特征图F'与特征图M_s(F')进行点乘得到特征图F”。Furthermore, the joint weight adjustment module performs convolution on the input feature map to obtain a feature map F; then the channel weight adjustment module is used to adjust the proportion of each channel of the feature map F to obtain a feature map _Mc (F); then the feature map _Mc (F) is multiplied by the feature map F to obtain a feature map F'; then the position weight adjustment module is used to adjust the proportion of each pixel point of the feature map F' to obtain a feature map _Ms (F'); finally, the feature map F' is multiplied by the feature map _Ms (F') to obtain a feature map F".

进一步地，所述通道权值调整模块采取的操作用公式表示为：Furthermore, the operation taken by the channel weight adjustment module is expressed by the formula:

M_c(F)＝σ{MLP[AvgPool(F)]+MLP[MaxPool(F)]} _Mc (F)＝σ{MLP[AvgPool(F)]+MLP[MaxPool(F)]}

其中，AvgPool()表示平均池化，MaxPool()表示最大池化，MLP[]表示多层感知机，σ{}表示sigmoid函数；Among them, AvgPool() represents average pooling, MaxPool() represents maximum pooling, MLP[] represents multi-layer perceptron, and σ{} represents sigmoid function;

所述位置权值调整模块采取的操作用公式表示为：The operation taken by the position weight adjustment module is expressed by the formula:

M_s(F')＝σ{f^7×7[AvgPool(F')；MaxPool(F')]} _Ms (F') = σ{f7 ^×7 [AvgPool(F');MaxPool(F')]}

其中，f^7×7[]表示使用7×7的卷积核进行卷积。Among them, f ^7×7 [] indicates that convolution is performed using a 7×7 convolution kernel.

进一步地，所述预测特征模块采用K-means聚类算法确定候选框的宽高比，具体包括步骤：Furthermore, the prediction feature module uses a K-means clustering algorithm to determine the aspect ratio of the candidate box, specifically including the steps of:

1)获得数据集中真实框的4个坐标值[X_min，Y_min，X_max，Y_max]，X_min、Y_min分别表示真实框左下角点的横坐标和纵坐标，X_max、Y_max分别表示真实框右上角点的横坐标和纵坐标，并根据四个坐标值计算得到对应的宽高比；1) Obtain the four coordinate values [X _min , Y _min , X _max , Y _max ] of the real box in the data set, where X _min and Y _min represent the horizontal and vertical coordinates of the lower left corner of the real box, respectively, and X _max and Y _max represent the horizontal and vertical coordinates of the upper right corner of the real box, respectively, and calculate the corresponding aspect ratio based on the four coordinate values;

2)初始化，预先设定k个聚类中心；2) Initialization, pre-set k cluster centers;

3)依次计算真实框与聚类中心的交并比，将真实框分配到最小交并比的聚类簇中；3) Calculate the intersection-and-union ratio of the true box and the cluster center in turn, and assign the true box to the cluster with the minimum intersection-and-union ratio;

4)分配完所有的真实框，重新计算聚类中心的位置；4) After all the real boxes are assigned, the location of the cluster center is recalculated;

5)判断聚类中心位置是否发生变化，若是则返回至步骤3)，若否则得到最新的K个聚类中心；5) Determine whether the position of the cluster center has changed. If so, return to step 3). If not, obtain the latest K cluster centers.

6)根据确定的K个聚类中心确定候选框的宽高比。6) Determine the aspect ratio of the candidate box based on the determined K cluster centers.

进一步地，候选框的宽高比包括1，2/3，3/2，4/5，6/5，9/5共六种比例。Furthermore, the aspect ratios of the candidate boxes include six ratios: 1, 2/3, 3/2, 4/5, 6/5, and 9/5.

本发明还提供一种应用上述方法的改进SSD的国土耕地违规占用检测系统，其关键在于：包括模型构建模块、数据集构建模块、模型训练模块和模型应用模块；The present invention also provides an improved SSD national land and farmland illegal occupation detection system using the above method, the key of which is: including a model construction module, a data set construction module, a model training module and a model application module;

所述模型构建模块用于对SSD目标检测神经网络模型进行改进，得到改进SSD目标检测神经网络模型；The model building module is used to improve the SSD target detection neural network model to obtain an improved SSD target detection neural network model;

所述数据集构建模块用于采集国土耕地范围内的违规占用行为图片，并划分违规占用种类，构建数据集；The data set construction module is used to collect pictures of illegal occupation behaviors within the scope of national arable land, classify illegal occupation types, and construct a data set;

所述模型训练模块用于采用所述数据集对所述国土耕地违规占用检测模型进行训练学习，得到国土耕地违规占用检测模型；The model training module is used to train and learn the national land and cultivated land illegal occupation detection model using the data set to obtain the national land and cultivated land illegal occupation detection model;

所述模型应用模块用于采用所述国土耕地违规占用检测模型对国土耕地图像进行违规占用检测，得到检测结果。The model application module is used to use the national land and cultivated land illegal occupation detection model to perform illegal occupation detection on the national land and cultivated land image to obtain the detection result.

本发明还提供一种计算机可读存储介质，其关键在于：其上存储有计算机应用程序，该程序被处理器运行以实现上述基于神经网络的国土耕地违规占用高精度检测方法。The present invention also provides a computer-readable storage medium, the key of which is that a computer application is stored thereon, and the program is executed by a processor to implement the above-mentioned high-precision detection method for illegal occupation of national land based on neural network.

本发明提供的基于神经网络的国土耕地违规占用高精度检测方法及系统，针对传统技术的种种弊端，应用高清视频与深度学习结合的方式，依托于高清摄像头实时采集的视频数据，利用改进的SSD目标检测算法快速完成耕地违规占用的高精度实时智能检测。首先，SSD整体改用调整的ResNet-50为主干网络；其次，嵌入特征融合模块(R50-FFM)融合高低层，添加Shortcut模块(额外特征层)生成预测特征金字塔；然后，添加联合权值调整模块(JWAM)，增强有用目标信息；最后，采用K-means聚类算法优化候选框的宽高比，使其更适用于自制数据集。实验结果表明，改进的SSD目标检测神经网络模型可提高检测精度，有效提高小目标的检测效果。最终模型mAP值为91.24％，相比SSD提高5.74％，检测速率为40.3fps，可为国土资源监测作参考应用。The high-precision detection method and system for illegal occupation of cultivated land based on neural network provided by the present invention, aiming at various drawbacks of traditional technology, uses the combination of high-definition video and deep learning, relies on the video data collected in real time by high-definition camera, and uses the improved SSD target detection algorithm to quickly complete the high-precision real-time intelligent detection of illegal occupation of cultivated land. First, the SSD overall uses the adjusted ResNet-50 as the backbone network; secondly, the feature fusion module (R50-FFM) is embedded to fuse the high and low layers, and the Shortcut module (extra feature layer) is added to generate the prediction feature pyramid; then, the joint weight adjustment module (JWAM) is added to enhance the useful target information; finally, the K-means clustering algorithm is used to optimize the aspect ratio of the candidate box to make it more suitable for self-made data sets. Experimental results show that the improved SSD target detection neural network model can improve the detection accuracy and effectively improve the detection effect of small targets. The final model mAP value is 91.24%, which is 5.74% higher than SSD, and the detection rate is 40.3fps, which can be used as a reference for land and resources monitoring.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明实施例提供的SSD算法的框架图；FIG1 is a framework diagram of an SSD algorithm provided by an embodiment of the present invention;

图2是本发明实施例提供的改进SSD算法的框架图；FIG2 is a framework diagram of an improved SSD algorithm provided by an embodiment of the present invention;

图3是本发明实施例提供的特征融合模块的原理图；FIG3 is a schematic diagram of a feature fusion module provided in an embodiment of the present invention;

图4是本发明实施例提供的R50-FFM结构图；FIG4 is a structural diagram of R50-FFM provided in an embodiment of the present invention;

图5是本发明实施例提供的额外添加层内部结构图；FIG5 is a diagram showing the internal structure of an additional layer provided in an embodiment of the present invention;

图6是本发明实施例提供的联合权值调整模块结构图；FIG6 is a structural diagram of a joint weight adjustment module provided in an embodiment of the present invention;

图7是本发明实施例提供的通道权值调整模块结构图；7 is a structural diagram of a channel weight adjustment module provided in an embodiment of the present invention;

图8是本发明实施例提供的位置权值调整模块结构图；FIG8 is a structural diagram of a position weight adjustment module provided in an embodiment of the present invention;

图9是本发明实施例提供的K-means聚类算法流程图；FIG9 is a flow chart of a K-means clustering algorithm provided in an embodiment of the present invention;

图10是本发明实施例提供的聚类中心可视化图；FIG10 is a visualization diagram of cluster centers provided by an embodiment of the present invention;

图11是本发明实施例提供的SSD算法检测过程示例图；FIG11 is an example diagram of an SSD algorithm detection process provided by an embodiment of the present invention;

图12是本发明实施例提供的改进前后可视化检测效果对比图，其中的(a)对应改进前，(b)对应改进后；FIG12 is a comparison diagram of the visual detection effect before and after the improvement provided by an embodiment of the present invention, wherein (a) corresponds to before the improvement, and (b) corresponds to after the improvement;

图13是本发明实施例提供的主干网络精度、损失对比图；FIG13 is a comparison diagram of backbone network accuracy and loss provided by an embodiment of the present invention;

图14是本发明实施例提供的消融实验精度走势对比图。FIG. 14 is a comparison chart of ablation experiment accuracy trends provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图具体阐明本发明的实施方式，实施例的给出仅仅是为了说明目的，并不能理解为对本发明的限定，包括附图仅供参考和说明使用，不构成对本发明专利保护范围的限制，因为在不脱离本发明精神和范围基础上，可以对本发明进行许多改变。The following specifically illustrates the implementation mode of the present invention in conjunction with the accompanying drawings. The embodiments are provided for illustrative purposes only and are not to be construed as limitations of the present invention. The accompanying drawings are provided for reference and illustration only and do not constitute limitations on the scope of patent protection of the present invention, because many changes may be made to the present invention without departing from the spirit and scope of the present invention.

现如今，在耕地内架设铁塔司空见惯，利用铁塔挂载摄像头获取的影像作图像研究具有实时性好、清晰度高、稳定性强、方便灵活等优点；深度学习，作为AI领域炙手可热的一个分支，在计算机视觉等方面飞速发展，也极大促进AI的进步。Nowadays, it is common to erect towers in cultivated land. Using images obtained by cameras mounted on towers for image research has the advantages of good real-time performance, high clarity, strong stability, convenience and flexibility. Deep learning, as a hot branch in the field of AI, has developed rapidly in computer vision and other aspects, and has also greatly promoted the progress of AI.

针对传统技术的种种弊端，本文应用高清视频与深度学习结合的方式，依托于中国铁塔公司建造通信铁塔挂载的高清摄像头实时采集的视频数据，利用SSD目标检测算法快速完成耕地违规占用的实时智能检测，并对SSD算法作进一步改进以达满意效果。实现乡村土地规划的智能化，极大降低人工巡查成本，提升政府工作效率，对未来智慧城市和智慧国土的建设发展有重要意义。In view of the various drawbacks of traditional technologies, this paper combines high-definition video with deep learning, relying on the video data collected in real time by high-definition cameras mounted on communication towers built by China Tower Corporation, and uses the SSD target detection algorithm to quickly complete the real-time intelligent detection of illegal occupation of cultivated land, and further improves the SSD algorithm to achieve satisfactory results. The realization of intelligent rural land planning greatly reduces the cost of manual inspections and improves government work efficiency, which is of great significance to the construction and development of future smart cities and smart land.

具体的，本发明实施例提供的基于神经网络的国土耕地违规占用高精度检测方法包括步骤：Specifically, the high-precision detection method for illegal occupation of cultivated land based on a neural network provided in an embodiment of the present invention comprises the following steps:

S1、对SSD目标检测神经网络模型进行改进，得到改进SSD目标检测神经网络模型；S1. Improve the SSD target detection neural network model to obtain an improved SSD target detection neural network model;

S2、采集国土耕地范围内的违规占用行为图片，并划分违规占用种类，构建数据集；S2. Collect pictures of illegal occupation of cultivated land within the scope of the country, classify the types of illegal occupation, and construct a data set;

S3、采用数据集对国土耕地违规占用检测模型进行训练学习，得到国土耕地违规占用检测模型；S3, using the data set to train and learn the national land and farmland illegal occupation detection model to obtain the national land and farmland illegal occupation detection model;

S4、采用国土耕地违规占用检测模型对国土耕地图像进行违规占用检测，得到检测结果。S4. Use the national land and farmland illegal occupation detection model to detect illegal occupation of national land and farmland images to obtain detection results.

(1)步骤S1：改进SSD目标检测神经网络模型(1) Step S1: Improving the SSD target detection neural network model

SSD目标检测神经网络模型或SSD算法(Single Shot MutiBox Detectior，SSD)是刘伟于2016年在ECCV提出的经典的单阶段目标检测算法，沿用了Faster-RCNN的anchor机制，其算法框架如图1所示。它将VGG-16作为特征提取网络，舍弃其全连接层转换为卷积层，并额外增加一系列卷积层获得预测结果。使用不同尺度的特征图做检测，大尺度特征图用来检测小物体，小尺度特征图用来检测大物体。The SSD target detection neural network model or SSD algorithm (Single Shot MultiBox Detector, SSD) is a classic single-stage target detection algorithm proposed by Liu Wei in ECCV in 2016. It uses the anchor mechanism of Faster-RCNN. Its algorithm framework is shown in Figure 1. It uses VGG-16 as a feature extraction network, discards its fully connected layer and converts it into a convolutional layer, and adds a series of convolutional layers to obtain prediction results. Feature maps of different scales are used for detection. Large-scale feature maps are used to detect small objects, and small-scale feature maps are used to detect large objects.

SSD300输入图像大小300×300。首先，利用预先训练好的特征提取网络对输入图像特征提取，得到Conv4_3、Conv7、Conv8_2、Conv9_2、Conv10_2、Conv11_2共6个预测特征层内不同尺度的特征图；其次，对各特征图通过滑动窗口的方式生成一系列候选框，其数量和比例人为设定；然后，对每个候选框分类，判定是否包含目标并根据类别分数确定其类别；进一步，对含有目标的候选框进行边界框回归预测，使之更准确地定位目标；最后，使用非极大值抑制(NMS)算法剔除冗余框，获得最终预测目标的位置和所属类别。The input image size of SSD300 is 300×300. First, the input image features are extracted using the pre-trained feature extraction network to obtain feature maps of different scales in 6 prediction feature layers, namely Conv4_3, Conv7, Conv8_2, Conv9_2, Conv10_2, and Conv11_2. Secondly, a series of candidate boxes are generated for each feature map by sliding windows, and the number and proportion of the boxes are set manually. Then, each candidate box is classified to determine whether it contains a target and its category is determined according to the category score. Furthermore, bounding box regression prediction is performed on the candidate boxes containing the target to locate the target more accurately. Finally, the non-maximum suppression (NMS) algorithm is used to remove redundant boxes to obtain the position and category of the final predicted target.

为了提高模型检测精度，本发明对SSD目标检测神经网络模型进行改进，得到的改进SSD目标检测神经网络模型的架构如图2所示。本发明采用特征提取能力更好、网络表征能力更强、计算量相对较低的ResNet-50作为SSD主干网络改进模型框架，并删除和添加一系列额外层来构建SSD整个网络。残差模块的引入保证随着网络的加深，图像的高阶语义特征不会消失，避免了梯度消失、爆炸问题。ResNet-50由一系列瓶颈层(Bottleneck)有机组成。每个Bottleneck是由Conv1×1、3×3、1×1三层卷积连接而成，通过恒等映射分支下采样和Conv1×1实现输入输出相加。In order to improve the model detection accuracy, the present invention improves the SSD target detection neural network model, and the architecture of the obtained improved SSD target detection neural network model is shown in Figure 2. The present invention adopts ResNet-50 with better feature extraction ability, stronger network representation ability and relatively low computational complexity as the SSD backbone network improvement model framework, and deletes and adds a series of additional layers to construct the entire SSD network. The introduction of the residual module ensures that as the network deepens, the high-order semantic features of the image will not disappear, avoiding the problem of gradient disappearance and explosion. ResNet-50 is organically composed of a series of bottleneck layers (Bottleneck). Each Bottleneck is composed of three layers of convolutions: Conv1×1, 3×3, and 1×1, and the input and output addition is achieved through the identity mapping branch downsampling and Conv1×1.

去除ResNet-50的conv5_x及其之后的结构，保留前四个卷积层，第一层Conv7×7后添加最大池化，后三层分别由3、4、6个Bottleneck组成，将激活函数Relu替换为Leaky-Relu并添加BN(Batch Normal)归一化操作。Remove conv5_x and subsequent structures of ResNet-50, retain the first four convolutional layers, add maximum pooling after the first layer Conv7×7, and the last three layers consist of 3, 4, and 6 Bottlenecks respectively. Replace the activation function Relu with Leaky-Relu and add BN (Batch Normal) normalization operation.

随着卷积层网络的加深，网络提取的语义特征越来越强，而高分辨率的浅层特征图有极强的定位信息。小目标像素占比少，其语义信息在多次下采样后逐步变得模糊。SSD把6层不同程度特征图同等对待，不能充分利用局部细节特征(定位)和全局的语义特征(识别)各自的优点。所以，本例使用不同于FPN的特征融合模块(Feature Fusion Module，R50-FFM)，把浅层的细节特征和高层的语义特征结合起来，使各特征图都有高定位、识别信息，增强小目标的检测效果。As the convolutional layer network deepens, the semantic features extracted by the network become stronger and stronger, and the high-resolution shallow feature map has extremely strong positioning information. The pixel ratio of small targets is small, and its semantic information gradually becomes blurred after multiple downsampling. SSD treats the 6 layers of feature maps of different degrees equally, and cannot fully utilize the respective advantages of local detail features (positioning) and global semantic features (recognition). Therefore, this example uses a feature fusion module (Feature Fusion Module, R50-FFM) different from FPN to combine shallow detail features with high-level semantic features, so that each feature map has high positioning and recognition information, enhancing the detection effect of small targets.

如图3所示，本例使用通道叠加(Concat)进行特征融合，这种方式具有集成不同层次和空间的语义信息的特点，具体是将各层特征图进行1×1卷积及双线性插值操作，使高低层特征图对应元素能够通道叠加以进行特征融合。As shown in Figure 3, this example uses channel superposition (Concat) for feature fusion. This method has the characteristics of integrating semantic information of different levels and spaces. Specifically, each layer of feature map is subjected to 1×1 convolution and bilinear interpolation operations so that the corresponding elements of the high-level and low-level feature maps can be channel-superimposed for feature fusion.

利用特征融合函数φ_f将经尺度调节函数ζ_i调整为统一尺度后的各特征图X_i融合为X_f，便于利用特征金字塔函数φ_p在融合层X_f基础上生成新的特征金字塔X'_p，然后利用函数φ_c,l从特征金字塔中预测目标位置(loc)和分类(class)。该过程可用公式表示为：The feature fusion function φ _f is used to fuse the feature maps Xi that have been adjusted to a uniform scale by the scale adjustment function _{ζ i} _into X _f , so that the feature pyramid function φ _p can be used to generate a new feature pyramid X' _p based on the fusion layer X _f , and then the function φ _c,l is used to predict the target location (loc) and classification (class) from the feature pyramid. The process can be expressed as:

其中，i∈(2-3,3-4,4-6)分别对应ResNet-50前四层中的后三层卷积层，利用函数φ_p生成6个不同尺度的特征金字塔(特征图)，共预测8732个边界框，预测边界框时，利用函数φ_c,l和NMS等方法从8732个边界框中得到最终的目标位置和类别。∪(X'_p)表示对生成新的特征金字塔X'_p进行边界框生成的操作。Among them, i∈(2-3,3-4,4-6) corresponds to the last three convolutional layers in the first four layers of ResNet-50, and the function φ _p is used to generate 6 feature pyramids (feature maps) of different scales, and a total of 8732 bounding boxes are predicted. When predicting the bounding boxes, the function φ _c,l and NMS methods are used to obtain the final target position and category from the 8732 bounding boxes. ∪(X' _p ) represents the operation of generating bounding boxes for generating a new feature pyramid X' _p .

本例将特征融合模块嵌入SSD主干网络ResNet-50内，并在其后添加额外特征层以生成特征融合金字塔。将ResNet-50的conv2_3、conv3_4、conv4_6三个卷积层调整至同一尺度以进行Concat连接，从而实现高低层多尺度的特征融合。具体地，如图4，上述三层经卷积分别生成75×75×256、38×38×512、19×19×1024大小的特征图，后统一大小为38×38×1024。经BN层处理得到的特征图作为SSD第一个预测特征层Feature Map1。In this example, the feature fusion module is embedded in the SSD backbone network ResNet-50, and an additional feature layer is added afterwards to generate a feature fusion pyramid. The three convolutional layers conv2_3, conv3_4, and conv4_6 of ResNet-50 are adjusted to the same scale for Concat connection, thereby realizing multi-scale feature fusion of high and low layers. Specifically, as shown in Figure 4, the above three layers generate feature maps of sizes 75×75×256, 38×38×512, and 19×19×1024 respectively through convolution, and then unified to 38×38×1024. The feature map obtained by BN layer processing is used as the first prediction feature layer Feature Map1 of SSD.

如图5所示的额外添加层内部结构，本例还依次添加5个Shortcut模块(Bottleneck式操作)，作为额外添加层以生成SSD其余5个预测特征层Feature Map2至Feature Map6，尺度依次为19×19×1024、10×10×512、5×5×512、3×3×256、1×1×256。Feature Map1至Feature Map6共同组成了特征融合金字塔，作为SSD预测特征层的特征图。As shown in Figure 5, the internal structure of the additional added layer, this example also adds 5 shortcut modules (Bottleneck operation) in sequence as additional added layers to generate the remaining 5 SSD prediction feature layers Feature Map2 to Feature Map6, with scales of 19×19×1024, 10×10×512, 5×5×512, 3×3×256, and 1×1×256. Feature Map1 to Feature Map6 together form a feature fusion pyramid as the feature map of the SSD prediction feature layer.

联合权值调整模块(Joint Weight Adjustment Module，JWAM)在卷积神经网络中可作为一种“小插件”，独立于其他网络模型，被广泛应用于目标检测、图像识别等领域中。受人类处理视觉图像信息启发，它能以高权值关注有用信息，低权值忽略不相关信息，并且不断调整权值，使其在不同情况下都能获得更多更重要的信息，以保证卷积层空间及通道的有用性，使定位和分类更精确。The Joint Weight Adjustment Module (JWAM) can be used as a "small plug-in" in convolutional neural networks. It is independent of other network models and is widely used in target detection, image recognition and other fields. Inspired by how humans process visual image information, it can focus on useful information with high weights, ignore irrelevant information with low weights, and continuously adjust the weights so that it can obtain more and more important information in different situations, to ensure the usefulness of the convolutional layer space and channels, and make positioning and classification more accurate.

本例在预测特征金字塔前三层的输出特征图前添加有联合权值调整模块进行权值联合调整。联合权值调整模块的结构如图6所示，联合权值调整模块对输入的特征图进行卷积后，得到特征图F；然后采用通道权值调整模块对特征图F各通道的占比关系做权值调整，得到特征图M_c(F)；然后将特征图M_c(F)与特征图F进行点乘得到特征图F'；然后采用位置权值调整模块对特征图F'各像素点的占比关系做权值调整，得到特征图M_s(F')；最后将特征图F'与特征图M_s(F')进行点乘得到特征图F”。In this example, a joint weight adjustment module is added before the output feature map of the first three layers of the prediction feature pyramid to perform joint weight adjustment. The structure of the joint weight adjustment module is shown in Figure 6. After the joint weight adjustment module performs convolution on the input feature map, a feature map F is obtained; then the channel weight adjustment module is used to adjust the weight of the proportion of each channel of the feature map F to obtain the feature map _Mc (F); then the feature map _Mc (F) is multiplied with the feature map F to obtain the feature map F'; then the position weight adjustment module is used to adjust the weight of the proportion of each pixel point of the feature map F' to obtain the feature map _Ms (F'); finally, the feature map F' is multiplied with the feature map _Ms (F') to obtain the feature map F".

对于通道权值调整模块(Channel Weight Adjustment Module，CWAM)，该模块用来关注目标通道信息，对图像各通道的占比关系作权值调整。如图7所示，该模块在特征图F每一个通道上分别作空间最大池化和均值池化，得到特征图。按通道顺序分别排列产生的最大、均值池化值，得到两个结果向量。再将两者分别送入两层的全连接层(多层感知机MLP)作降维、升维运算。最后，将输出特征结果相加，经过σ函数(sigmoid函数)激活生成通道调整向量M_C，即每个通道的比重。该过程用公式表示为：The Channel Weight Adjustment Module (CWAM) is used to focus on the target channel information and adjust the weights of the proportions of each channel in the image. As shown in Figure 7, the module performs spatial maximum pooling and mean pooling on each channel of the feature map F to obtain the feature map . Arrange the maximum and mean pooling values in channel order to obtain two result vectors. Then send them to the two fully connected layers (multi-layer perceptron MLP) for dimensionality reduction and dimensionality increase operations. Finally, add the output feature results and activate them with the σ function (sigmoid function) to generate the channel adjustment vector M _C , that is, the weight of each channel. The process is expressed as:

其中，W₀和W₁表示两层全连接层的权重。Among them, _W0 and _W1 represent the weights of the two fully connected layers.

对于通道权值调整模块(Position Weight Adjustment Module，PWAM)，该模块用来关注目标位置信息，对图像各像素点占比关系作权值调整。如图8所示，该模块在特征图F'各通道同一位置分别作最大、均值池化，得到特征图，然后合并两个池化结果，得到(W,H,2)(W为宽度，H为高度，2为通道数)的特征图。最后对其使用7×7的卷积核，经σ函数对卷积后的特征图激活得到位置调整向量M_S。该过程用公式表示为：As for the Position Weight Adjustment Module (PWAM), this module is used to focus on the target position information and adjust the weight of each pixel ratio in the image. As shown in Figure 8, this module performs maximum and mean pooling at the same position of each channel of the feature map F' to obtain the feature map , and then merge the two pooling results to obtain the feature map of (W, H, 2) (W is the width, H is the height, and 2 is the number of channels). Finally, use a 7×7 convolution kernel to activate the convolved feature map through the σ function to obtain the position adjustment vector _MS . The process is expressed as follows:

通道模块的输出M_C与F使用点乘得到F'作为位置模块的输入，实现两者的连接，整体构成联合权值调整模块。输入特征图F经过权值联合调整得到最终输出特征图F”，用公式表示为：The output of the channel module _Mc and F are multiplied by points to obtain F' as the input of the position module to achieve the connection between the two, and the whole constitutes a joint weight adjustment module. The input feature map F is adjusted by weight to obtain the final output feature map F', which can be expressed as:

由于特征图分辨率小于10×10后，包含的信息愈发模糊，因此，将JWAM插入前三个特征图后来关注目标信息。Since the information contained in the feature map becomes increasingly blurred when the resolution is less than 10×10, JWAM is inserted into the first three feature maps to focus on the target information.

候选框生成需设定缩放比s_k和宽高比a_r。Candidate box generation requires setting the scaling ratio _sk and aspect ratio a _r .

缩放比s_k，候选框尺寸相对于输入图像缩小的比例。s_k设定遵循线性递增规则：Scaling ratio _sk , the ratio of the candidate box size to the input image. _sk is set to follow a linear increase rule:

由于第1层独立设置，m取5，设置为s_min/2＝0.1。s_min和s_max为比例的上下限，取0.2和0.9，第2～6层候选框尺度按照公式(5)线性增加。Since the first layer is set independently, m is set to 5 and s _min /2 = 0.1. s _min and s _max are the upper and lower limits of the ratio, which are set to 0.2 and 0.9. The scales of the candidate boxes of the 2nd to 6th layers increase linearly according to formula (5).

宽高比a_r{1，1*，2，1/2，3，1/3}，共6种比例用来生成候选框。a_r＝1时，生成缩放比为s_k的正方形边界框1；a_r＝1*，生成缩放比的正方形边界框2。每个候选框的实际宽高：Aspect ratio a _r {1, 1*, 2, 1/2, 3, 1/3}, a total of 6 ratios are used to generate candidate boxes. When _{a r} = 1, a square bounding box with a scaling ratio of _sk is generated; a _r = 1*, a square bounding box with a scaling ratio of sk is generated. The actual width and height of each candidate box:

候选框与真实目标(Ground Truth，GT)匹配原则：依次计算各候选框与GT的IOU值，若其大于某阈值(一般为0.5)，则该候选框与GT匹配，将负责预测GT，为正样本；否则为负样本。为避免正负样本失衡，SSD对负样本抽样，按置信度误差排序选取，控制正负样本比例在1：3。Candidate box and ground truth (GT) matching principle: Calculate the IOU value of each candidate box and GT in turn. If it is greater than a certain threshold (usually 0.5), the candidate box matches GT and will be responsible for predicting GT, which is a positive sample; otherwise, it is a negative sample. To avoid imbalance between positive and negative samples, SSD samples negative samples and selects them by confidence error, controlling the ratio of positive and negative samples to 1:3.

SSD算法预先设置的候选框起初是针对Pascal VOC2007数据集，尺度设置更倾向于大、中目标，而预测框通过候选框解码而来，不能满足对小目标的检测需求。因此，需要针对自建数据集进一步调整候选框宽高比例，提高目标与候选框的匹配概率，进一步提高检测精度。The candidate boxes pre-set by the SSD algorithm were originally for the Pascal VOC2007 dataset, and the scale setting is more inclined to large and medium targets. The predicted boxes are decoded from the candidate boxes and cannot meet the detection requirements for small targets. Therefore, it is necessary to further adjust the width and height ratio of the candidate boxes for the self-built dataset to increase the matching probability between the target and the candidate box, and further improve the detection accuracy.

本文在候选框宽高比优化的选择上使用K-means聚类算法，通过聚类的方式得到更适用于本文数据集的宽高比，具体包括步骤：This paper uses the K-means clustering algorithm to optimize the aspect ratio of the candidate box. The aspect ratio that is more suitable for the dataset in this paper is obtained by clustering. The specific steps include:

3)依次计算真实框与聚类中心的距离，将真实框分配到最小距离的聚类簇中；3) Calculate the distance between the true box and the cluster center in turn, and assign the true box to the cluster with the minimum distance;

传统聚类方法使用欧氏距离来衡量真实框与聚类中心的距离，但此方法在box尺寸比较大时，其误差也更大，使用IOU值代理欧式距离来避免这个问题。Traditional clustering methods use Euclidean distance to measure the distance between the true box and the cluster center, but this method has a larger error when the box size is larger. The IOU value is used as a proxy for the Euclidean distance to avoid this problem.

IOU值衡量方法如下：The IOU value is measured as follows:

其中，IOU为交并比，G为真实框，K为聚类框，C为聚类中心，D为衡量指标，D越小真实框越接近聚类框。Among them, IOU is the intersection-over-union ratio, G is the true box, K is the cluster box, C is the cluster center, and D is the measurement index. The smaller D is, the closer the true box is to the cluster box.

通过计算得出K＝9时平均IOU值最高，即准确率为74.68％，其高宽比依次为(31.875，42.15)、(49.82，75.05)、(91.63，81)、(66.5，129)、(142.65，116.94)、(102.92，173.25)、(168，207.1)、(255.9，156.4)、(256.82，262.88)，9个聚类中心如图10所示。经过数据分析，最终选取1，2/3，3/2，4/5，6/5，9/5作为候选框宽高比例。By calculation, it is found that when K=9, the average IOU value is the highest, that is, the accuracy is 74.68%, and its aspect ratios are (31.875, 42.15), (49.82, 75.05), (91.63, 81), (66.5, 129), (142.65, 116.94), (102.92, 173.25), (168, 207.1), (255.9, 156.4), (256.82, 262.88), and the 9 cluster centers are shown in Figure 10. After data analysis, 1, 2/3, 3/2, 4/5, 6/5, and 9/5 are finally selected as the width-to-height ratios of the candidate boxes.

(2)步骤S2：创建数据集(2) Step S2: Create a dataset

本文要实现的违规占用耕地行为监测基于国土资源局已划定的ROI(感兴趣区域)，不包含之前已存在的违规占用耕地行为。对国土耕地范围内常见的违规占用行为划分特征种类以构建数据集，获取视频截取图像，每小时保存10幅图像并剔除不含占用目标的图像；将图像统一裁剪至640*550使用Retinex去雾算法对图像数据集去干扰，最终挑选包含15个种类：6种工程车辆、5种家禽、2种建筑、鱼塘、树木共859张图像构建数据集。The monitoring of illegal occupation of cultivated land to be realized in this paper is based on the ROI (region of interest) delineated by the Bureau of Land and Resources, and does not include the illegal occupation of cultivated land that has existed before. The common illegal occupation behaviors within the scope of national cultivated land are divided into characteristic categories to construct a data set, and video capture images are obtained. 10 images are saved every hour and images without occupation targets are removed; the images are uniformly cropped to 640*550 and the Retinex defogging algorithm is used to remove interference from the image data set. Finally, 859 images of 15 categories including 6 types of engineering vehicles, 5 types of poultry, 2 types of buildings, fish ponds, and trees are selected to construct a data set.

使用图像翻转、旋转、滤波、色彩空间转换中的一种或多种随机组合操作将图像数据集扩充6倍至5154张。按照8:1:1的比例随机划分为训练集、验证集和测试集，如表1所示。使用labelimg标注工具对图像打标签处理得到标签文件Annotations，仿照VOC格式构成数据集供改进的SSD模型训练学习。The image dataset was expanded 6 times to 5154 images using one or more random combinations of image flipping, rotation, filtering, and color space conversion. The dataset was randomly divided into training, validation, and test sets in a ratio of 8:1:1, as shown in Table 1. The labelimg annotation tool was used to label the images to obtain the label file Annotations, which was constructed in the VOC format for training and learning of the improved SSD model.

表1数据集划分Table 1 Dataset division

(3)步骤S3：模型训练(3) Step S3: Model training

本文实验基于Windows平台，CPU为AMD Ryzen 7 4800H，显卡为NVIDIIA GTX1650ti。编程语言为Python-3.9。编译环境为torch-1.10，torchvision-0.11.0，cuda-10.2。网络训练时对改进SSD算法权重的初始化采用传统SSD在VOC 2007数据集上训练得到的权重。其训练参数设置如表2所示。The experiment in this paper is based on the Windows platform, the CPU is AMD Ryzen 7 4800H, and the graphics card is NVIDIIA GTX1650ti. The programming language is Python-3.9. The compilation environment is torch-1.10, torchvision-0.11.0, and cuda-10.2. During network training, the weights of the improved SSD algorithm are initialized using the weights obtained by training the traditional SSD on the VOC 2007 dataset. The training parameter settings are shown in Table 2.

表2训练参数配置Table 2 Training parameter configuration

本例使用mAP(多种类的平均精度，mean Average Precision)值来评价模型，它综合考虑类别和定位，是目标检测领域评价模型性能的重要指标。在SSD算法中，以目标框和生成预测框的关系为基础，定义如下。This example uses the mAP (mean Average Precision) value to evaluate the model. It comprehensively considers the category and positioning and is an important indicator for evaluating model performance in the field of object detection. In the SSD algorithm, the relationship between the target box and the generated prediction box is defined as follows.

TP：真正例，实际为目标且与生成预测框IOU大于阈值，正确预测目标；TN：真负例，实际为背景且无预测框生成，正确预测背景；FP：假正例，错误将背景预测为目标，为目标误检；FN：假负例，错误将目标预测为背景，为目标漏检。TP: True positive, actually the target and the IOU with the generated prediction box is greater than the threshold, the target is predicted correctly; TN: True negative, actually the background and no prediction box is generated, the background is predicted correctly; FP: False positive, the background is mistakenly predicted as the target, which is a false detection of the target; FN: False negative, the target is mistakenly predicted as the background, which is a missed detection of the target.

准确率(Precision，P)，查准率，表示预测为正例的样本中预测正确的比例：Precision (P), the precision rate, indicates the proportion of samples predicted as positive examples that are correctly predicted:

召回率(Recall，R)，查全率，表示实际为正例的样本中预测正确的比例：Recall (R), the recall rate, indicates the proportion of samples that are actually positive examples that are predicted correctly:

用AP(Average Precision)来描述单一种类的平均精度，表示P与R在0～1上积分。AP (Average Precision) is used to describe the average precision of a single type, which means the integration of P and R on 0-1.

(4)步骤S4：模型应用(4) Step S4: Model application

采用国土耕地违规占用检测模型对实时生成的国土耕地图像进行违规占用检测，得到检测结果。The national land and farmland illegal occupation detection model is used to detect illegal occupation of national land and farmland images generated in real time to obtain detection results.

为了便于应用上述方法，本实施例还提供一种改进SSD的国土耕地违规占用检测系统，包括模型构建模块、数据集构建模块、模型训练模块和模型应用模块；In order to facilitate the application of the above method, this embodiment also provides an improved SSD national land and farmland illegal occupation detection system, including a model construction module, a data set construction module, a model training module and a model application module;

模型构建模块用于对SSD目标检测神经网络模型进行改进，得到改进SSD目标检测神经网络模型；The model building module is used to improve the SSD target detection neural network model to obtain an improved SSD target detection neural network model;

数据集构建模块用于采集国土耕地范围内的违规占用行为图片，并划分违规占用种类，构建数据集；The dataset construction module is used to collect pictures of illegal occupation behaviors within the scope of national arable land, classify the types of illegal occupation, and construct a dataset;

模型训练模块用于采用数据集对国土耕地违规占用检测模型进行训练学习，得到国土耕地违规占用检测模型；The model training module is used to train and learn the national land and farmland illegal occupation detection model using the data set to obtain the national land and farmland illegal occupation detection model;

模型应用模块用于采用国土耕地违规占用检测模型对国土耕地图像进行违规占用检测，得到检测结果。The model application module is used to use the national land and farmland illegal occupation detection model to detect illegal occupation of national land and farmland images and obtain detection results.

本实施例还提供一种计算机可读存储介质，其上存储有计算机应用程序，该程序被处理器运行以实现基于神经网络的国土耕地违规占用高精度检测方法。This embodiment also provides a computer-readable storage medium on which a computer application is stored. The program is executed by a processor to implement a high-precision detection method for illegal occupation of national land based on a neural network.

(5)实验(5) Experiment

实验的数据集设置和对模型的训练与上文一致。The experimental data set setting and model training are consistent with the above.

1、改进前后检测效果对比1. Comparison of detection effects before and after improvement

图11为SSD检测图像学习过程生成的部分特征图，SSD最终定位到目标并输出类别最高分数的类别。为了评估改进算法的性能及小目标的检测效果，本文挑选了4幅数据集图像作检测效果对比，如图12可以看出，对于像素占比高的大中型目标而言，如大型工程车辆，前后检测没有明显变化；而对于小目标而言，如高空拍摄下的羊群等，包含的语义信息较少，SSD出现许多漏检目标，甚至出现误检的情况(图12的(a)将羊误检为牛)。经过改进的SSD算法检测，其小目标漏检率得到提升，增强了小目标的检测效果。Figure 11 shows some feature maps generated by the SSD detection image learning process. SSD finally locates the target and outputs the category with the highest score. In order to evaluate the performance of the improved algorithm and the detection effect of small targets, this paper selects 4 dataset images for detection effect comparison. As shown in Figure 12, for large and medium-sized targets with a high pixel ratio, such as large engineering vehicles, there is no obvious change in the detection before and after; while for small targets, such as sheep photographed from high altitudes, which contain less semantic information, SSD has many missed detection targets and even false detections (Figure 12 (a) misdetects sheep as cows). After the improved SSD algorithm detection, its small target missed detection rate is improved, which enhances the detection effect of small targets.

2、与其他目标检测算法比较2. Comparison with other target detection algorithms

为了验证本算法可以提高检测精度，与其他先进的目标检测算法作对比，评价指标为mAP值和检测速度，使用同一自建数据集，结果如表3。In order to verify that this algorithm can improve the detection accuracy, it is compared with other advanced target detection algorithms. The evaluation indicators are mAP value and detection speed. The same self-built dataset is used. The results are shown in Table 3.

表3不同算法性能对比Table 3 Performance comparison of different algorithms

由表3结果可知，本改进算法与两阶段算法Faster R-CNN相比，FPS提高了34.08，说明了单阶段算法的速度优势，精度提升2.22％；与改进SSD较为经典的DSSD相比，精度提高2.64％，FPS提升24.7；与注重检测速度的轻量Yolov4-tiny相比精度提升5.06％，FPS降低33.2。实验表明，本算法相较于其他算法具有一定的综合性能，能够满足实时检测违规占用耕地行为的实际要求。From the results in Table 3, we can see that compared with the two-stage algorithm Faster R-CNN, the FPS of this improved algorithm is increased by 34.08, which shows the speed advantage of the single-stage algorithm, and the accuracy is increased by 2.22%; compared with the more classic DSSD of improved SSD, the accuracy is increased by 2.64%, and the FPS is increased by 24.7; compared with the lightweight Yolov4-tiny that focuses on detection speed, the accuracy is increased by 5.06%, and the FPS is reduced by 33.2. The experiment shows that compared with other algorithms, this algorithm has certain comprehensive performance and can meet the actual requirements of real-time detection of illegal occupation of cultivated land.

3、消融实验分析3. Ablation Experiment Analysis

首先更换主干网络，使用同一数据集根据上述实验参数配置训练200个epoch，每10个epoch作一次评估。其mAP值和训练集损失曲线对比如图13，明显看出，替换主干网络后mAP得到提升、损失减小，证明了ResNet-50网络具有更强的特征提取能力，防止网络退化。结果如表4，mAP值和检测速度分别提高1.32％和2.2FPS，这是因为删减的ResNet-50保证网络深度的同时减小了参数量。First, the backbone network was replaced, and the same data set was used to train 200 epochs according to the above experimental parameter configuration, and an evaluation was performed every 10 epochs. The comparison of its mAP value and training set loss curve is shown in Figure 13. It can be clearly seen that after replacing the backbone network, the mAP is improved and the loss is reduced, which proves that the ResNet-50 network has stronger feature extraction capabilities and prevents network degradation. The results are shown in Table 4. The mAP value and detection speed are increased by 1.32% and 2.2FPS respectively. This is because the pruned ResNet-50 ensures the network depth while reducing the number of parameters.

表4主干网络对比实验Table 4 Backbone network comparison experiment

为了验证本算法各改进环节的有效性，在更换主干网络的基础上继续做消融对比试验，逐步添加R50-FFM、JWAM模块及K-means优化，通过比较分析每个模块的性能。统一使用自建数据集，评价指标为mAP值和检测速度。图14为消融实验精度走势对比，表5为实验结果，单独嵌入R50-FFM模块，mAP提升2.4％，证明了深层语义信息和浅层细节信息融合的有效性，与单一特征图相比，融合的多尺度特征图包含更多有用信息，网络参数增多使FPS下降5.6。以此为基础，添加JWAM模块，mAP提升1.41％，证明了JWAM模块使网络关注有目标的区域提升检测精度，FPS继续降低；使用K-means优化，mAP值小幅提升0.49％，证明对候选框比例优化的有效性，提高目标和候选框的匹配概率，FPS提高1.3，更适合的候选框可提升检测速度。综合所有改进，最终的算法mAP值为91.24％，相比于基线ResNet-50，mAP提升4.29％，证明本算法改进的有效性。In order to verify the effectiveness of each improved link of this algorithm, ablation comparison tests were continued on the basis of replacing the backbone network, and R50-FFM, JWAM modules and K-means optimization were gradually added to analyze the performance of each module by comparison. Self-built data sets were used uniformly, and the evaluation indicators were mAP value and detection speed. Figure 14 shows the comparison of the accuracy trend of ablation experiments, and Table 5 shows the experimental results. The R50-FFM module was embedded alone, and the mAP increased by 2.4%, proving the effectiveness of the fusion of deep semantic information and shallow detail information. Compared with a single feature map, the fused multi-scale feature map contains more useful information. The increase in network parameters caused the FPS to drop by 5.6. Based on this, the JWAM module was added, and the mAP increased by 1.41%, proving that the JWAM module makes the network focus on the target area to improve the detection accuracy, and the FPS continues to decrease; using K-means optimization, the mAP value increased slightly by 0.49%, proving the effectiveness of optimizing the candidate frame ratio, improving the matching probability of the target and the candidate frame, and the FPS increased by 1.3. More suitable candidate frames can improve the detection speed. Combining all the improvements, the final algorithm mAP value is 91.24%, which is 4.29% higher than the baseline ResNet-50, proving the effectiveness of this algorithm improvement.

表5消融实验结果Table 5 Ablation experiment results

综上所述，本发明针对国土耕地出现的违规占用乱象，通过铁塔挂载摄像头获取视频结合SSD目标检测算法对划定区域内进行监测，进而提高执法部门的工作效率。对SSD算法作进一步优化改进，使之更适用于国土资源监测，更换主干网络，加深网络深度的同时减少网络参数，增强网络抗干扰能力和特征提取能力；添加特征融合模块，使高低层都有强的语义和定位信息，增强小目标检测性能；添加权值调整模块，抑制无关信息；使用K-means优化候选框比例，精度得到进一步提升。实验结果表明，模型检测精度由85.8％提升至91.24％，充分证明了算法的有效性。对比其他目标检测算法，也有着更高的综合性能，能够满足工程实际需求，适合应用于国土资源监测。In summary, the present invention targets the illegal occupation of cultivated land in the country, obtains videos through tower-mounted cameras, and combines the SSD target detection algorithm to monitor the designated area, thereby improving the work efficiency of law enforcement departments. The SSD algorithm is further optimized and improved to make it more suitable for land and resources monitoring, replace the backbone network, deepen the network depth while reducing network parameters, enhance the network's anti-interference ability and feature extraction ability; add a feature fusion module so that both high and low layers have strong semantics and positioning information, and enhance the small target detection performance; add a weight adjustment module to suppress irrelevant information; use K-means to optimize the candidate frame ratio, and the accuracy is further improved. The experimental results show that the model detection accuracy is increased from 85.8% to 91.24%, which fully proves the effectiveness of the algorithm. Compared with other target detection algorithms, it also has higher comprehensive performance, can meet the actual needs of the project, and is suitable for land and resources monitoring.

上述实施例为本发明较佳的实施方式，但本发明的实施方式并不受上述实施例的限制，其他的任何未背离本发明的精神实质与原理下所作的改变、修饰、替代、组合、简化，均应为等效的置换方式，都包含在本发明的保护范围之内。The above embodiments are preferred implementation modes of the present invention, but the implementation modes of the present invention are not limited to the above embodiments. Any other changes, modifications, substitutions, combinations, and simplifications that do not deviate from the spirit and principles of the present invention should be equivalent replacement methods and are included in the protection scope of the present invention.

Claims

1. A high-precision detection method for illegal occupation of cultivated land based on a neural network, characterized in that it comprises the following steps:

The SSD target detection neural network model is improved to obtain an improved SSD target detection neural network model;

Collect pictures of illegal occupation of cultivated land within the scope of the country, classify the types of illegal occupation, and build a data set;

The data set is used to train and learn the national land and cultivated land illegal occupation detection model to obtain the national land and cultivated land illegal occupation detection model;

The national land and cultivated land illegal occupation detection model is used to perform illegal occupation detection on the national land and cultivated land image to obtain the detection result.

2. According to the high-precision detection method for illegal occupation of cultivated land based on neural network in claim 1, it is characterized in that the SSD target detection neural network model is improved, specifically:

The feature extraction network VGG-16 of the SSD target detection neural network model is replaced with ResNet-50, and a feature fusion module is embedded in ResNet-50 to fuse high- and low-level information;

Adding an extra feature layer after ResNet-50 generates a prediction feature pyramid;

A joint weight adjustment module is added before the output feature maps of the first three layers of the prediction feature pyramid to perform joint weight adjustment;

The K-means clustering algorithm is used to optimize the aspect ratio of the candidate box to make it suitable for the dataset.

3. The high-precision detection method for illegal occupation of cultivated land based on neural network according to claim 2 is characterized in that: the improved SSD target detection neural network model includes a feature extraction module, a prediction feature pyramid module and a prediction feature module;

The feature extraction module uses the first 4 convolutional layers of ResNet-50 to extract features, and uses the feature fusion module to fuse the output features of the last 3 layers of the 4 convolutional layers to obtain a fused feature map;

The prediction feature pyramid module uses the fused feature map as the first prediction feature layer, and adds an additional feature layer on each subsequent prediction feature layer of the first prediction feature layer to generate the remaining five prediction feature layers, thereby obtaining a prediction feature pyramid consisting of six prediction feature layers; a joint weight adjustment module is added before the output feature maps of the first three layers of the prediction feature pyramid to perform joint weight adjustment;

The prediction feature module generates a set of candidate boxes by sliding windows from each feature map of the prediction feature pyramid, and uses the K-means clustering algorithm to determine the aspect ratio of the candidate boxes so that they are suitable for the data set. The number and proportion of the candidate boxes are set according to requirements. Then, each candidate box is classified to determine whether it contains a target and its category is determined according to the category score. Then, bounding box regression prediction is performed on the candidate boxes containing the target. Finally, the non-maximum suppression algorithm is used to eliminate redundant boxes to obtain the position and category of the final predicted target.

4. The high-precision detection method for illegal occupation of cultivated land based on a neural network according to claim 3 is characterized by:

The feature fusion module performs 1×1 convolution and bilinear interpolation operations on the feature maps output by the second, third and fourth convolutional layers conv2_3, conv3_4 and conv4_6 of ResNet-50 to adjust each feature map to the same scale, and then performs channel superposition to obtain fusion features; the second, third and fourth convolutional layers conv2_3, conv3_4 and conv4_6 of ResNet-50 are composed of 3, 4 and 6 Bottlenecks respectively, each Bottleneck is connected by three layers of convolution, Conv1×1, 3×3 and 1×1, and the input and output are added by downsampling the identical mapping branch and Conv1×1, and the activation function of each Bottleneck adopts Leaky-Relu and adds BN normalization operation; the maximum pooling is added after the first convolutional layer of ResNet-50;

Each additional feature layer adopts a structure identical to the Bottleneck.

5. According to the high-precision detection method for illegal occupation of cultivated land based on neural network in claim 4, it is characterized in that: the joint weight adjustment module convolves the input feature map to obtain a feature map F; then the channel weight adjustment module is used to adjust the proportion of each channel of the feature map F to obtain a feature map _Mc (F); then the feature map _Mc (F) is multiplied with the feature map F to obtain a feature map F'; then the position weight adjustment module is used to adjust the proportion of each pixel point of the feature map F' to obtain a feature map _Ms (F'); finally, the feature map F' is multiplied with the feature map _Ms (F') to obtain a feature map F".

6. According to the high-precision detection method for illegal occupation of cultivated land based on neural network in claim 5, it is characterized in that the operation taken by the channel weight adjustment module is expressed by the formula:

_Mc (F)＝σ{MLP[AvgPool(F)]+MLP[MaxPool(F)]}

Among them, AvgPool() represents average pooling, MaxPool() represents maximum pooling, MLP[] represents multi-layer perceptron, and σ{} represents sigmoid function;

The operation taken by the position weight adjustment module is expressed by the formula:

_Ms (F') = σ{f7 ^×7 [AvgPool(F');MaxPool(F')]}

Among them, f ^7×7 [] indicates that convolution is performed using a 7×7 convolution kernel.

7. The high-precision detection method for illegal occupation of cultivated land based on neural network according to claim 6 is characterized in that the prediction feature module uses K-means clustering algorithm to determine the aspect ratio of the candidate box, specifically comprising the steps of:

1) Obtain the four coordinate values [X _min , Y _min , X _max , Y _max ] of the real box in the data set, where X _min and Y _min represent the horizontal and vertical coordinates of the lower left corner of the real box, respectively, and X _max and Y _max represent the horizontal and vertical coordinates of the upper right corner of the real box, respectively, and calculate the corresponding aspect ratio based on the four coordinate values;

2) Initialization, pre-set k cluster centers;

3) Calculate the intersection-and-union ratio of the true box and the cluster center in turn, and assign the true box to the cluster with the minimum intersection-and-union ratio;

4) After all the real boxes are assigned, the location of the cluster center is recalculated;

5) Determine whether the position of the cluster center has changed. If so, return to step 3). If not, obtain the latest K cluster centers.

6) Determine the aspect ratio of the candidate box based on the determined K cluster centers.

8. According to the high-precision detection method for illegal occupation of national land based on neural network in claim 7, it is characterized in that the aspect ratio of the candidate frame includes six ratios: 1, 2/3, 3/2, 4/5, 6/5, and 9/5.

9. A system for detecting illegal occupation of cultivated land using the improved SSD of the high-precision detection method for illegal occupation of cultivated land based on a neural network as claimed in any one of claims 1 to 8, characterized in that it comprises a model building module, a data set building module, a model training module and a model application module;

The model building module is used to improve the SSD target detection neural network model to obtain an improved SSD target detection neural network model;

The data set construction module is used to collect pictures of illegal occupation behaviors within the scope of national arable land, classify illegal occupation types, and construct a data set;

The model training module is used to train and learn the national land and cultivated land illegal occupation detection model using the data set to obtain the national land and cultivated land illegal occupation detection model;

The model application module is used to use the national land and cultivated land illegal occupation detection model to perform illegal occupation detection on the national land and cultivated land image to obtain the detection result.

10. A computer-readable storage medium, characterized in that a computer application is stored thereon, and the program is executed by a processor to implement the high-precision detection method for illegal occupation of land and cultivated land based on a neural network as described in any one of claims 1 to 8.