Detection method for identifying small target
Technical Field
The invention relates to a real-time small target detection and identification model of a double-head PANet structure based on a parallel strategy, which is provided by the field of target detection and identification and is suitable for scenes with smaller image target size and large difficulty in small target detection and identification.
Background
Object detection and recognition is an important research direction in the field of computer vision. With the wide application of artificial intelligence technology in the field of computer vision, object detection is also increasingly gaining attention as one of the representative problems in computer vision, and people are increasingly interested since the deep learning method is used. The method for detecting and identifying targets is greatly developed, especially in the field of complex image processing, such as scene understanding, target tracking, event detection, damage identification, remote sensing image processing and the like. The target detection technology is gradually developed to a method utilizing deep learning from a traditional image feature extraction operator in combination with a machine learning method, and the detection effect is gradually improved. As basic content in a complex vision application task, target detection and identification aim to detect an interested target in a given image or video, position coordinate information of the target is returned, the type of the target is classified and identified, and detection probability is output. In the target detection problem, small target detection and identification are a difficulty in practical application. The small target problem is a difficult point in visual tasks such as object detection and semantic segmentation, the detection precision of the small target is usually only half of that of the large target, and the number of the small targets is more in the COCO data set. The small target has low resolution, the image is fuzzy, and the carried information is less, so the feature expression capability is weak, that is, in the process of extracting the feature, the number of the features which can be extracted is very small, which is not beneficial to the detection of the small target, and is the reason that the small target is difficult to detect. Most of the small objects are concentrated in some small number of pictures. This results in the model not learning the characteristics of the small target half of the time during training. In the deep learning target detection, the problems of low special resolution, less information, more noise and blurred pictures are always a practical and common difficult problem. The small target has small size, less total amount of features, insufficient detail information and fuzzy local features, so that the difficulty in extracting effective features which can be understood by a computer is high. How to analyze information from the image for computer understanding is a central problem of machine vision. The deep convolution neural network in the artificial intelligence deep learning carries out feature extraction on the target through convolution operation, the extracted deeper feature has larger receptive field and abundant semantic information, and has robustness on pose change, shielding, local deformation and the like of the object to a certain extent. The shallow characteristic receptive field extracted by the deep convolutional neural network is small, the geometric texture information is rich, and abstract semantic information is lacked. For a target with a smaller size, the shallow feature extracted by the network contains some detail information of the shallow feature, however, as the number of layers is increased and the receptive field is increased, the geometric detail information in the extracted feature may disappear completely, and the detection of a small object through the deep feature becomes difficult. In addition, the specific distribution of shallow and deep features of small targets in which layers is uncertain, and the small targets cannot be screened. Therefore, in order to improve the small target classification capability of the network, both the target shallow features and the deep features should be extracted to preserve the available information to the maximum extent.
The algorithm for detecting and identifying the small target mostly adopts a strategy of reserving multi-scale information, and multi-scale features are extracted by depending on an image pyramid and a feature pyramid. In the traditional algorithm, an image pyramid is generally used for constructing an original image into images with different resolutions, and then a classifier with fixed input resolution is used for sliding on each pyramid layer to detect a target so as to detect a small target at the bottom of the pyramid; alternatively, on the original image, classifiers with different resolutions are used to detect the target, so that a small target is detected in a classifier with a smaller resolution, but this method is slow. Although the build image pyramid can be accelerated or resize directly and simply roughly by using convolution kernel separation in general, multiple feature extractions are required. Classical feature extraction and classification methods, such as a haar feature extraction operator combined with a cascade Adaboost algorithm, a DPM target recognition framework composed of a Hog feature and an SVM, all use an image pyramid mode to process multi-scale targets. However, in the above conventional method, a feature extraction operator needs to be manually selected to extract features of each layer in the pyramid, and the actual requirements in the aspects of robustness, time consumption, memory consumption and the like cannot be met. The worst is to predict the small resolution picture directly from the large resolution picture (up-sampling). However, this is based on the premise that the training samples are identical in number and abundance under ideal conditions, but in practice, many data sets are severely lacking in small samples.
In recent years, a deep learning model gradually replaces a traditional machine vision method and becomes a mainstream algorithm in the field of target detection. Among them, the algorithms with better real-time performance include ssd (single Shot multitox detector), yolo (young Only Look one), etc. SSD is an early integrated single-phase model, achieving higher accuracy while having an order of magnitude faster speed than the two-phase model. The SSD takes the feature maps of different stride as detection layers to respectively detect targets with different scales, and the scale processing is simple and effective in the mode, but has some defects: generally, a low-level detection small target is used, but the low-level receptive field is small, the context information is lack, and the false detection is easy to introduce; simple multi-scale information of a single detection layer is slightly lacked, and the target scale change range of many tasks is very obvious.
The YOLO series algorithm provides a new idea of single-stage target detection and identification, and the speed advantage is obvious. However, the earlier YOLO itself has some problems, such as that the divided grids are rough, the number of target frames generated by each grid limits the detection of small-scale objects and similar objects. Aiming at the defects, the YOLO algorithm is further optimized, and the updating is iterated to the current YOLOv4 version. YOLOv4 is a single-stage target detection recognition algorithm which integrates multiple optimization strategies on the basis of YOLOv 3. YOLOv4 proposes a Mosaic data enhancement technology by arranging effective performance enhancement strategies; and further updating the backbone network to a CSPDarknet53 structure, adding an SPP structure, a PANet structure and the like, improving the small target detection and identification performance of the network, and simultaneously ensuring better real-time performance. The PANET is a novel pyramid attention module, is embedded between a backbone network of YOLOv4 and a detection head, and captures the relationship between multi-scale features to improve the detection and identification performance. However, in the task of detecting and identifying the small-size target of the unmanned aerial vehicle, YOLOv4 still has insufficient detection performance, and cannot acquire enough useful small target features to improve the target detection and identification effect. The existing small target detection method based on deep learning improves a mainstream target detection model to improve the detection performance of the small target. According to different improvement ideas, the small target detection method can be divided into 5 methods based on multi-scale prediction, improved feature resolution, context information, data enhancement technology and a new backbone network and a training strategy.
In recent years, with the further development of the unmanned aerial vehicle technology, the unmanned aerial vehicle is widely applied to the fields of traffic, entertainment, environmental monitoring, disaster early warning, animal protection, military and the like. After the unmanned aerial vehicle is provided with a sensing device such as a camera, information such as images can be obtained for target detection. The unmanned aerial vehicle target detection and identification problem is small in target size, complex in flight background and accompanied with the difficulty of interference target appearance. The distance between the unmanned aerial vehicle and the ground is far in the flying process, the target size distribution range is wide, and the target to be identified presents the characteristics of a small target. The small target is often few in pixel number and single in fuzzy characteristic, and factors such as target shielding, insufficient light, interference of objects with similar shapes and the like exist, so that the condition of missing detection or false detection easily occurs. Therefore, in the unmanned aerial vehicle target detection and identification task, the size change range of the targets of the same type or different types is large, and a large number of small-scale targets exist. The unmanned aerial vehicle aerial image target detection and identification method is suitable for target detection and identification models with conventional sizes, the acquired small target detail information is less, and the precision is lower in target detection and identification of aerial images of the unmanned aerial vehicle. The existing small target real-time detection and identification network model has large feature graph scale change and long shallow feature transmission path with small down-sampling rate, is easy to lose detail information and reduces the detection precision of small targets. This has just increased certain degree of difficulty for unmanned aerial vehicle target detection and discernment. How to detect and identify small-scale targets based on images acquired by the unmanned aerial vehicle is a key technical problem to be solved in the field. Object detection requires locating the position of an object and identifying the class of the object. Because unmanned aerial vehicle flight altitude difference is big, the collection image visual angle is variable, therefore the target in the unmanned aerial vehicle image has that the yardstick changes greatly, little target is more, background characteristic is complicated, has characteristics such as a large amount of shelters from, and this has brought certain difficulty for unmanned aerial vehicle image target detection. And unmanned aerial vehicle target detection and discernment need higher real-time, and ordinary moving object detection algorithm has great gap. References "advances in laser and optoelectronics", 2017, 54: 111002, the weather is sunny day, and the weather is honor burst, discloses an unmanned aerial vehicle aerial image positioning method based on the YOLOv2, targets in the unmanned aerial vehicle aerial image are detected and positioned by using the YOLOv2 algorithm, the average accuracy of target detection and identification is improved to 79.5%, and the model positioning accuracy is more than 84%. Although the detection effect is good, the real-time performance of the method is difficult to meet the practical application. Reference literature industrial control computer, 2018,31(9):46-49 Lianmo discloses a damaged manhole cover detection method based on aerial images of unmanned aerial vehicles, and Hencha, Liu science discloses unmanned aerial vehicle autonomous target tracking research based on airborne machine vision. The manhole cover in the aerial image of the unmanned aerial vehicle is detected by using YOLOv2, the accuracy rate is 82.6%, and although the performance of detecting the broken manhole cover in the experimental data set meets the requirement, the precision of the data set in a complex scene is obviously reduced. Because the target that unmanned aerial vehicle shot is easily influenced by factors such as flying height, shooting angle, weather than conventional target, the establishment of background model is easily disturbed when machine carries cloud platform camera motion. The obtained same target has large size and visual feature difference, a large number of small targets exist, the problems of false detection, missing detection and the like occur, and the model detection precision is obviously deteriorated. According to the conventional target detection and identification model, the acquired small target detail information is less, and the precision is lower in target detection and identification of the aerial image of the unmanned aerial vehicle.
Disclosure of Invention
The invention aims to solve the problems of the prior art that the defects and the small target detection difficulty are high, and provides a small target detection and identification method which is high in identification rate, small in calculation amount, strong in robustness and higher in detection precision on a target data set. So as to improve the problem of small target detection and identification.
The technical scheme of the invention is as follows: a detection method for identifying small targets is characterized in that: a main network structure for extracting data features is adopted as a network neck part to obtain a polymerization network PANet of a plurality of down-sampling scale features, and a real-time small target detection and identification network is formed by three parts of a network detection head for estimating target information; optimizing a path aggregation network (PANet) structure containing a characteristic pyramid, and expanding an original PANet structure by using a parallel strategy; changing a characteristic jumping connection mode in the characteristic pyramid, and transmitting the characteristics with larger scale from the backbone network to the detection head; changing the structure of the PANet under the parallel strategy into a double-head structure, and connecting an improved pyramid attention module with a Bottom-up Path Augmentation four-layer structure; inputting and changing the network detection head into four feature maps with different scales, aggregating features among different layers, and fusing the feature maps with different scales to obtain a final target detection and identification result; adopting a feature fusion mode in a small target detection model Yolov4 to perform down-sampling feature extraction on an input image, and changing addition of adjacent layers into splicing of the adjacent layers; the method comprises the following steps that a neck part connecting a network backbone network and a detection head is of an improved parallel PANet structure, the parallel PANet is connected with the backbone network and the network detection head, the network detection head predicts the type and the position of a target by adopting the detection head in a network pre-training model Yolov3, a detector is focused on a small object to identify the small target, more data are generated through enhancement, a new data enhancement image is generated from a basic data set, and the resolution ratio of image acquisition is improved; changing a characteristic jumping mode in a characteristic pyramid to improve the characteristic extraction of the small-scale target, extracting and fusing the multi-scale characteristics of the small target, filtering irrelevant categories, and keeping the multi-scale characteristics of the small target; carrying out down-sampling feature extraction on an input image by using a backbone network structure in a small-scale target detection and identification model; and predicting a conditional probability value for each category by each division grid in the image, directly obtaining a prediction result from the image, and outputting target detection information based on a detection head of the anchor point.
Compared with the prior art, the invention has the following beneficial effects:
the invention realizes a real-time small target detection and identification network based on improved double-end PANET, adopts a YOLO main structure network, obtains a plurality of PANet with down-sampling scale characteristics as a network neck part, and three parts of a network detection head for estimating target information form the real-time small target detection and identification network (figure 1). The improved PANET pyramid attention module is embedded between a main structure of the model and the network detection head, and the problem of high difficulty in extracting small-scale target features is solved by changing a feature jumping connection mode in the feature pyramid. The network detection head adopts an anchor point-based detection head in a Yolov3 model and is used for outputting target detection information. The neck part connecting the trunk and the detection head adopts an improved double-head PANET structure, the feature information acquired by the detection head is increased by utilizing the PANET, meanwhile, the depth of the PANET is not increased, the target recognition rate can be greatly improved, and the problem of feature disappearance caused by the increase of the depth of a network structure is reduced.
According to the invention, by analyzing the structural characteristics and the application of the conventional PANET, and combining the problems of prominent small target problem of a sample and the like, an improved double-head PANET structure is adopted at a main structure for connecting a detection network and a neck part of a detection head. The target labeling frame is used for forming network training prior knowledge and a backbone network structure to extract and fuse multi-scale features, the number of the multi-scale features is increased, and the false detection and omission rate is low. Through the improved small-scale target detection and identification network structure, the analysis experiment result shows that the improved network structure can overcome the negative effect caused by multi-scale change of the target, better detect the small target and has small operand. The resolution of the training image is improved compared to v 1.
The method adopts a parallel connection strategy to expand an original three-layer PANET structure into four layers as an improved structure, connects four scale features in an original Yolov4 model into the PANET, and changes a Bottom-up Path evaluation structure from top to Bottom into four layers which are connected with FPNs in the PANET. The improved structure is embedded into a Yolov4 model network structure, and replaces the original neck part of the network; the diversity of the characteristics obtained by the target detection network detection head is improved; the original PANET structure is expanded by adopting a parallel strategy, so that the feature transfer distance is reduced while multi-scale feature information is increased, and the gradient disappearance phenomenon is reduced; experiments prove that the structure realizes higher average precision than the original structure, and has robustness especially to small-scale examples and boundary information thereof. Compared with YOLO, YOLO9000 is greatly improved in the aspects of identifying types, precision, speed, positioning accuracy and the like.
The invention starts from a PANET (Path Aggregation network) structure based on the pyramid characteristic idea, and improves the detection performance of small-size targets by improving the level quantity and the link mode of the PANET. In order to verify the effect of the improved structure, the detection and identification of the small target are carried out by combining real-time target detection and identification models YOLOv4 and YOLOv 5. Experiments prove that the improved hierarchical structure of the PANET greatly helps to improve the detection performance of the small-scale target, and the small-scale target can be better detected.
The invention improves the detection performance of small-size targets by improving the number of the layers and the link mode of the PANET. The neck part connecting the detection network backbone network and the detection head adopts an improved parallel two-branch PANet structure, the parallel PANet is connected with the backbone network and the network detection head, the network detection head adopts the detection head in a network pre-training model Yolov3 to predict the type and the position of a target, and the detector is focused on a small object to identify the small target; changing a characteristic jumping mode in a characteristic pyramid to improve the characteristic extraction of the small-scale target, extracting and fusing the multi-scale characteristics of the small target, filtering irrelevant categories, and keeping the multi-scale characteristics of the small target; in order to verify the effect of the improved structure, the detection and identification of the small target are carried out by combining real-time target detection and identification models YOLOv4-csp and YOLOv5 l. Experiments prove that the improved hierarchical structure of the PANET greatly helps to improve the detection performance of the small-scale target, and the small-scale target can be better detected.
Drawings
FIG. 1 is a schematic diagram of the improved parallel two-branch PANet structure of the present invention;
FIG. 2 is a schematic diagram of a PANet feature fusion mode in an original Yolov4 model
FIG. 3 is a diagram of the structure of PANet in the original Yolov4/v5 network;
FIG. 4 is a schematic diagram of a four-layer PANET structure;
the invention is further illustrated with reference to the following figures and examples.
Detailed Description
See fig. 1-4. According to the invention, a backbone network structure for extracting data features is adopted as a path aggregation network (PANet) for acquiring a plurality of down-sampling scale features at the neck of a network, and a real-time small target detection and identification network is formed by three parts of a network detection head for estimating target information; optimizing a path aggregation network (PANet) structure containing a characteristic pyramid, and expanding an original PANet structure by using a parallel strategy; changing a characteristic jumping connection mode in the characteristic pyramid, and transmitting the characteristics with larger scale from the backbone network to the detection head; changing the structure of the PANet under the parallel strategy into a double-head structure, and connecting an improved pyramid attention module with a Bottom-up Path Augmentation four-layer structure; the input of the network detection head is changed into four feature maps with different scales, the features between different layers are aggregated, and the feature maps with different scales are fused together to obtain a target detection recognition result; adopting a feature fusion mode in a small target detection model Yolov4 to perform down-sampling feature extraction on an input image, and changing addition of adjacent layers into splicing of the adjacent layers; the method comprises the following steps that a neck part connecting a network backbone network and a detection head is of an improved parallel PANet structure, the parallel PANet is connected with the backbone network and the network detection head, the network detection head predicts the type and the position of a target by adopting the detection head in a network pre-training model Yolov3, a detector is focused on a small object to identify the small target, more data are generated through enhancement, a new data enhancement image is generated from a basic data set, and the resolution ratio of image acquisition is improved; changing a characteristic jumping mode in a characteristic pyramid to improve the characteristic extraction of the small-scale target, extracting and fusing the multi-scale characteristics of the small target, filtering irrelevant categories, and keeping the multi-scale characteristics of the small target; carrying out down-sampling feature extraction on an input image by using a backbone network structure in a small-scale target detection and identification model; and predicting a conditional probability value for each category by each division grid in the image, directly obtaining a prediction result from the image, and outputting target detection information based on a detection head of the anchor point.
The improved PANET structure comprises: the system comprises a convolution module, an up-sampling module and a down-sampling module, and forms a four-layer characteristic aggregation structure, wherein characteristic pyramids with different scales are sequentially connected to a convolution layer, a down-sampling module and a spatial pyramid pooling part to form a backbone network; and the largest two scale features in the feature pyramid are respectively connected with the third large scale feature layer in the pyramid to form a two-branch feature aggregation structure. The performance advantage of the improved PANet structure in small target detection is verified by adopting a single-stage real-time target detection and identification network which consists of a residual block, downsampling, Spatial Pyramid Pooling (Spatial Pyramid Pooling) and the like; in a target detection and identification network for verifying the PANET, a neck part connecting a network backbone network and a detection head adopts an improved parallel PANET structure to extract and fuse multi-scale features of small targets; and the main structure network adopts a main network structure in a Yolo model, and performs down-sampling feature extraction on the input image.
The target information output by the detection and recognition network comprises the coordinate value of the upper left corner point, the coordinate value of the lower right corner point, the class probability and the like of the detection frame.
The improved small target detection and identification model is composed of a backbone network, parallel double-ended PANet, a Yolov3 detection head and the like.
The detection head adopts a mesh division method based on anchor points to predict the target frame information, and b of each predicted target frame is output
x,b
y,b
w,b
h4 coordinates, representing the offset based on the coordinate of the upper left grid point of the center point of the prediction box as σ (t)
x) And σ (t)
y) And the edge distance of the cell where the center point of the target is located from the upper left corner of the image is represented as c
xAnd c
y. Obtaining an actual predicted value through the central point offset and the edge distance value as follows: b
x=σ(t
x)+c
x、b
y=σ(t
y)+c
y、
Where σ is the activation function, p
wAnd p
hFor the height and width of the corresponding target frame, t
w,t
hRespectively representing the predicted width of the output of the recognition network,High, p
w、p
hE is an exponential operation for the corresponding target frame width and height obtained by the previous network prediction.
The improved small target detection recognition model detection head is divided into four branches, each branch has 3 anchor points, and b isx,by,bw,bhAnd 5 values of the target probability c f constitute a single target frame information, the output tensor size of each branch is 3 × N +5, the four branches output 12 × N +5 tensors in total, and N is the category number.
Initializing 12 clustering centers of small-scale target detection and identification model (W)j,Hj) Performing initial size calculation on the sizes of 12 anchor points of 12 clustering centers by adopting a K-means clustering algorithm to calculate a label target frame boxGTAnd anchor boxAnchorUsing the intersection ratio IOU between the target frame and the anchor frame:
obtaining the coincidence degree d between the target frame and the anchor frameGT,Anchor:dGT,Anchor=1-IOU(boxGT,boxAnchor) And reallocating and calculating the cluster center points according to the distance metric until the variation of the cluster center is smaller than a threshold th, and stopping iteration, wherein j is 1,2, … and 12.
The small-scale target detection and recognition model adopts a public data set VisDrone to perform performance test on a double-head PANet-based YOLOv4/v5 model, the model performance is compared by combining an original YOLOv4/v5 model shown in the figure 3, the average accuracy AP is calculated, and the average accuracy AP is obtained:
and then, by taking the accuracy P and the recall rate R as horizontal and vertical coordinates, drawing a PR curve P (R), calculating the area under the PR curve, adopting an evaluation index as an Average accuracy Mean Average Precision (mAP), and calculating the mAP according to the N expression category number N:
the evaluation index adopted by the small-scale target detection and identification model is an average accuracy mean mAP, and the average accuracy mean mAP is calculated according to the category number N and the following formula:
accurately detecting the number TP of the targets, the number FP of the targets wrongly detected by the model and the number FN of the targets missed by the model, and calculating the accuracy P (precision) and the recall rate R of the detected targets:
and then drawing a PR curve P (R) by taking the accuracy P and the recall rate R as horizontal and vertical coordinates, calculating the area under the PR curve, and when N categories exist, measuring the performance of the model by using the average mAP of the APs of the multiple categories according to the average accuracy AP under a single target category.
The VisDrone data set comprises 12 types of targets, which are respectively as follows: ignored regions (aligned regions), pedestrians (pedestrians), people (peoples), bicycles (bicycls), cars (cars), vans (van), trucks (truck), tricycles (tricycles), man-powered tricycles (awning-tricycles), buses (buses), motorcycles (motor) and others (others).
In an alternative embodiment, the model was trained randomly from 6471 images of the training set, with data being written at 0.8: the ratio of 0.2 is divided into training set and validation set required by the experiment, and training and testing are performed with the size of 640 x 640. The measured MAP values for the validation set are shown in Table 1 below:
TABLE 1 Visdrone data set test MAP values
The invention carries out performance test on the improved PANet structure based on the models of YOLOv4 and YOLOv5 respectively. As can be seen from the above table, compared with the original YOLOv4-csp model, the YOLOv4-csp model based on the four-layer PANet and the YOLOv4-csp model based on the two-branch PANet both obtain higher MAP values in the small-target-more data set VisDrone; the improved two-branch-PANet-based small target detection recognition model achieves the highest mAP value of 25.5%. In the YOLOv5 model, the original YOLOv5l obtained a MAP value of 21.1% in the VisDrone dataset, higher than the 20.2% MAP value of the four-layer PANet based YOLOv5l model, and lower than the two-branch PANet based YOLOv5l model with a MAP value of 21.8%. In the YOLOv5l model, the improved two-branch PANet still obtained the highest MAP value, while the four-layer PANet-based YOLOv5l performance decreased, i.e., the four-layer PANet improved the model performance and its stability was insufficient. Compared with an improved two-branch PANet-based target detection and identification model and other two types of model structures, the improved two-branch PANet structure has the advantages that the minimum down-sampling rate layer and the detection output layer in the main network are in short connection through parallel double-head design, the transfer path of large-scale features beneficial to small-target detection is shortened, and more target features are reserved. Therefore, the improved two-branch PANet structure can enable the target detection and identification network to better detect small-scale targets, adapt to target scale change and improve target detection and identification performance.
Having thus described the embodiments of the present invention in detail, those skilled in the art will appreciate that various modifications and adaptations of the embodiments can be made without departing from the spirit and scope of the invention. The appended claims encompass such modifications and variations.