CN112733686A

CN112733686A - Target object identification method and device used in image of cloud federation

Info

Publication number: CN112733686A
Application number: CN202011641087.2A
Authority: CN
Inventors: 程家明; 孔繁东; 周志祥; 彭杨
Original assignee: Wuhan Xingtu Xinke Electronic Co ltd
Current assignee: Wuhan Xingtu Xinke Electronic Co ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-04-30

Abstract

The invention provides a target object identification method and a target object identification device for an image of a cloud federation, wherein the method comprises the following steps: carrying out Random-Batch images processing on the original image, fusing the original image with the original image, inputting the fused image into a ResNet network, and carrying out feature extraction to obtain a feature map; inputting the feature map into a bidirectional feature map pyramid network for deep feature map fusion to obtain a feature map with stronger semantic expression capability, inputting the feature map into a region generation network to generate a plurality of candidate boxes, inputting the feature boxes into a ROIAlign network layer to screen out regions of interest, and mapping the regions of interest to the feature map to obtain feature information of the regions of interest; and classifying the region of interest, performing frame regression and mask network processing on the region of interest through the full connection layer according to the characteristic information to obtain a semantic classification result of the original image so as to identify the target object. The method improves the model in the training process, so that the method has better effect on the fine-grained detection and identification of the target in the image.

Description

Target object identification method and device used in image of cloud federation

Technical Field

The invention relates to the technical field of image recognition, in particular to a method and a device for recognizing a target object in an image of cloud federation.

Background

Compared with the common target detection task, military wharf target detection in aerial images is more difficult. Firstly, the image itself is blurred because the distance is too far, the pixels are not very high; in addition, in the image, there are bridges and playgrounds with pixels exceeding 100 × 100, and various containers and ships with pixels smaller than 50 × 50, and the ship objects are dense, there are some parts overlapping each other, and there are aircraft and docks between them, the image complexity is higher, so the object identification method is required to have higher requirements on multi-scale and precision.

In order to identify each target in the aerial image of the military wharf, firstly, semantic segmentation is carried out, entities of different classes are identified, secondly, example segmentation is carried out on the entities of the same class, and finally the attribute of each target is detected.

At the semantic segmentation level, at present, RCNN, fast-RCNN, false-RCNN and the like are mainly used, wherein an RCNN network firstly extracts a propofol (candidate frame) in an image, then inputs the propofol (candidate frame) into a CNN (convolutional neural network) to extract features, classifies the features by using an SVM (support vector machine), and finally performs a Bbox reg (frame regression).

In order to solve the problem of slow RCNN speed, a Fast-RCNN algorithm is proposed at present. In Fast-RCNN, the input is changed to a whole image, and feature selection is performed through ROI. And Bbox reg (frame regression) and region classification are added into the network to become multi-task, and Fast-RCNN improves the defect that each frame of RCNN needs to be independently input into CNN, thereby improving the speed. However, although Fast-RCNN greatly increases speed, it takes a lot of time to screen the feature boxes.

In order to further improve the speed of selecting the Proposal (candidate box), an improved algorithm of Fast-RCNN based on Fast-RCNN is proposed, the Fast-RCNN is improved on the Fast-RCNN, an algorithm for quickly extracting the Proposal (candidate box), namely RPN (Region Proposal Network), is firstly proposed, and the RPN is well integrated into the Fast-RCNN. In the aspect of semantic segmentation, Fast-RCNN and the like have good effects, but the Fast-RCNN cannot perform instance segmentation and cannot meet the requirement of target identification.

For better target identification, an improved example segmentation algorithm Mask-RCNN is proposed on the basis of fast-RCNN. Firstly, the ROI (Region Of Interest) in the fast-RCNN is improved by the Mask-RCNN, the original ROI Pooling is improved into the ROI Align, and the error in the process Of Proposal (candidate frame) is greatly reduced; secondly, the FPN (feature pyramid network) in the Mask-RCNN is an extension of the backbone network, and can better represent the target on multiple scales. In addition, the most key point in the Mask-RCNN is that a parallel Mask (Mask network) branch for predicting a target Mask is added to the existing branch for identifying the bounding box, so that example segmentation is realized.

However, for the fine-grained target identification of the wharf remote sensing image, the robustness of the Mask-RCNN is still insufficient, and the accuracy of the fine-grained target identification is not very high. Therefore, the Mask-RCNN has insufficient robustness and low accuracy in fine-grained target identification, which is an urgent technical problem to be solved.

Disclosure of Invention

The invention provides a target object identification method and device used in an image of a cloud federation, and aims to solve the technical problems of insufficient robustness and low accuracy of fine-grained target identification of the traditional Mask-RCNN.

In order to achieve the above object, the present invention provides a method for identifying an object in an image of the cloud federation, including the steps of:

carrying out Random-Batch images processing on the original image to obtain a processed image;

fusing the processed image and the original image, inputting the fused image into a ResNet network, and performing feature extraction to obtain a feature map;

inputting the feature map into a bidirectional feature map pyramid network for deep feature map fusion to obtain a feature map with stronger semantic expression capability;

inputting the feature map with stronger semantic expression capability into a region generation network to generate a plurality of candidate frames;

inputting the candidate frames into an ROI Align network layer, and screening out an interested region;

mapping the region of interest to the feature map with stronger semantic expression capability to obtain the features of the region of interest;

and the full connection layer classifies the region of interest, performs frame regression and mask network processing on the region of interest according to the characteristics of the region of interest to obtain a semantic classification result of the original image so as to identify a target object in the original image.

Preferably, the Random-Batch images processing the original image to obtain a processed image includes:

for each image to be input, randomly intercepting a target object in an original image of 1280 × 1280 by using a screenshot frame of 640 × 640, and obtaining a screenshot of 640 × 640 for each image;

randomly selecting 4 screenshots for each time, and randomly splicing to obtain a combined image;

the combined image is mixed with the original image as a subsequent input.

Preferably, before the step of classifying, performing frame regression and mask network processing on the region of interest by the full-link layer according to the features of the region of interest to obtain a semantic classification result of the original image, so as to identify the target object in the original image, the method further includes:

the method has the advantages that the channel attention mechanism is added to the mask network in the full connection layer, attention can be improved for the target which is needed but not easy to recognize, and accuracy of model recognition is improved.

Preferably, the fully-connected layer performs classification, frame regression and mask network processing on the region of interest according to the features of the region of interest to obtain a semantic classification result of the original image so as to identify the target object in the original image, and the method includes:

inputting the region of interest into a full-connection layer, and classifying the region of interest according to the characteristics of the region of interest to obtain two outputs;

predicting the target object represented by each interested area through one of the outputs so as to classify different targets and obtain a target object prediction result;

performing frame regression on the target object represented by each region of interest through another output to obtain a candidate frame matching the size and the position of the target object, so that the model can identify the target object more accurately;

and according to the target object prediction result and the candidate frame, obtaining a semantic classification result of the original image by utilizing the mask network processing so as to identify the target object in the original image.

Preferably, after the step of classifying, performing frame regression and mask network processing on the region of interest by the fully-connected layer according to the features of the region of interest to obtain a semantic classification result of the original image, so as to identify the target object in the original image, the method further includes:

fine-tuning a hyper-parameter of the ResNet network based on the accuracy of the semantic classification result, wherein the hyper-parameter comprises: and under the condition that different networks have different activation functions, learning rates and optimizers, the most appropriate hyper-parameter is found through repeated debugging, and finally the optimal semantic classification result of the original image is output on a test set.

In addition, in order to achieve the above object, the present invention further provides a target recognition device for an image of a cloud federation, which includes a memory, a processor, and a target recognition program for an image of a cloud federation, stored on the memory and executable on the processor, wherein the target recognition program for an image of a cloud federation is executed by the processor to implement the steps of the target recognition method for an image of a cloud federation.

In addition, in order to achieve the above object, the present invention further provides a storage medium, on which an object recognition program in an image for cloud federation is stored, wherein the object recognition program in an image for cloud federation, when executed by a processor, implements the steps of the object recognition method in an image for cloud federation.

In order to achieve the above object, the present invention also provides an object recognition apparatus for use in an image of the cloud federation, including:

the image processing module is used for carrying out Random-Batch images processing on the original image to obtain a processed image;

the feature extraction module is used for fusing the processed image and the original image and inputting the fused image into a Resnet network for feature extraction to obtain a feature map;

the characteristic fusion module is used for inputting the characteristic graph into a bidirectional characteristic graph pyramid network to perform deep characteristic graph fusion to obtain a characteristic graph with stronger semantic expression capability;

the interesting region selection module is used for inputting the feature map with stronger semantic expression capability into a region generation network to generate a plurality of candidate frames, inputting the candidate frames into an ROI Align network layer and screening an interesting region;

the correlation establishing module is used for mapping the region of interest to the feature map with stronger semantic expression capability, acquiring the features of the region of interest and establishing correlation information between the region of interest and the corresponding features;

and the classification module is used for classifying the region of interest, performing frame regression and mask network processing on the region of interest through a full connection layer according to the associated information to obtain a semantic classification result of the original image so as to identify a target object in the original image.

The invention has the beneficial effects that:

(1) innovative Random-Batch images processing is added before the input of the original image, the spliced image is obtained and mixed with the original image to be used as input, subsequent training is carried out, the recognition performance of a single small target is improved, and the accuracy of the model is improved to a certain extent.

(2) The traditional FPN is changed into Bi-FPN, the image features are subjected to complex bidirectional fusion, a feature map capable of expressing semantic features better is obtained, and a better effect is achieved when fine-grained feature extraction is carried out.

(3) A channel attention mechanism is added, the channel attention mechanism can calculate the correlation between each channel and important features, the channel with higher correlation increases more attention to the channel, and the accuracy of pixel point classification is improved.

Drawings

FIG. 1 is a flow chart of a military target identification method for aerial images of the cloud federation in accordance with an embodiment of the present invention;

FIG. 2 is a flowchart of a Random-Batchimages process according to an embodiment of the present invention;

FIG. 3 is a structural diagram of Bi-FPN according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be further described with reference to the accompanying drawings.

Referring to fig. 1, fig. 1 is a flowchart of a military target identification method for an aerial image of cloud federation according to an embodiment of the present invention, and the embodiment of the present invention provides a military target identification method for an aerial image of cloud federation, including the following steps:

s1, carrying out Random-Batch images processing on the original aerial images to obtain processed images;

s2, fusing the processed image and the original aerial image, inputting the fused image into a ResNet50/101 network, and performing feature extraction to obtain a feature map;

s3, inputting the Feature Map into a bidirectional Feature Map pyramid network (Bi-FPN) for deep Feature Map fusion to obtain a Feature Map (Feature Map) with stronger semantic expression capability;

s4, inputting the feature map with stronger semantic expression ability into a region generation network, generating a plurality of candidate frames (Propusals), and selecting a region of interest (ROI) from the candidate frames;

s4, inputting the feature map with stronger semantic expression ability into a region generation network (RPN), generating a plurality of candidate frames (Propusals), inputting the candidate frames into an ROI Align network layer, and screening out a region of interest (ROI);

s5, mapping the region of interest (ROI) to the feature map with stronger semantic expression capability, obtaining the features of the region of interest (ROI), and establishing the associated information between the region of interest (ROI) and the corresponding features;

s6, the full connection layer carries out classification prediction (Cls _ prob), frame regression (Bbox Reg) and Mask network (Mask) processing on the region of interest according to the correlation information to obtain a semantic classification result of the original aerial image so as to identify a military target object in the original image;

the specific steps of S6 are: inputting the region of interest into a full-link layer, and classifying the region of interest (ROI) according to the characteristics corresponding to the region of interest to obtain two outputs;

normalizing (Softmax) the target object represented by each region of interest through one of the outputs, and performing classified prediction (Cls _ prob) so as to classify different targets to obtain a target object prediction result;

performing frame regression (Bbox Reg) on the target object represented by each region of interest through another output to obtain a candidate frame matching the size and the position of the target object, so that the model can identify the target object more accurately;

according to the target object prediction result and the candidate frame, the mask network is utilized for processing, wherein a channel Attention mechanism (Attention) is added to the mask network in the full connection layer, the Attention of a target which is needed but not easy to be identified can be improved by adding the channel Attention mechanism, the accuracy of model identification is improved, and finally the semantic classification result of the original image is obtained so as to identify the target object in the original image.

S7, fine-tuning a hyper-parameter of the ResNet network based on the accuracy of the semantic classification result, wherein the hyper-parameter comprises: and the hyper-parameters are repeatedly debugged to find the most appropriate hyper-parameters, and finally the optimal semantic classification result of the original image is output on the test set.

In the aerial images of military docks, compared with large ships, bridges, playgrounds and the like, a plurality of small targets such as containers, small ships and the like are difficult to identify through a traditional Mask-RCNN algorithm, and therefore innovative Random-Batch images are added to a data set in the process of inputting the images.

Referring to FIG. 2, FIG. 2 is a flow chart of a Random-Batchimages process according to an embodiment of the present invention. Carrying out Random-Batch images processing on the original image, which specifically comprises the following steps:

intercepting a target object in the original aerial image of 1280 multiplied by 1280 by using a screenshot frame of 640 multiplied by 640 for each aerial image to be input, wherein each image can obtain a screenshot of 640 multiplied by 640; randomly selecting 4 screenshots for each time, and randomly splicing to obtain a combined complete image; the combined complete image and the original aerial image are mixed to serve as subsequent input, and the mixed image is input into a ResNet50/101 network for training, so that the recognition performance of a single small target is improved, and the accuracy of the model is improved to a certain extent.

In order to identify small targets with finer granularity, in the process of generating a candidate frame (Proposal), the Mask-RCNN network uses FPN (pyramid network), but the fine-granularity feature identification is not enough, so that the Bi-FPN (bidirectional pyramid feature network) is used in the method.

Referring to FIG. 3, FIG. 3 is a structural diagram of a Bi-FPN according to an embodiment of the present invention. The Bi-FPN is a complex bidirectional fusion based on the FPN, because the characteristic diagram contains shallow layer and deep layer information of a picture, the FPN (characteristic pyramid network) only simply outputs the information of each layer, the Bi-FPN (bidirectional characteristic diagram pyramid network) fuses the information of different layers through convolutional neural networks of different convolutional kernels, and in order to strengthen the fusion effect of the information of each layer, the Bi-FPN network superposes a replicated Block (repeating unit) for 3 times. Therefore, for the output of each layer, the information of different layers of the picture is fused, and the feature graph words with stronger semantic expression capability are obtained. Therefore, the fine-grained feature extraction method has a better effect.

In addition, a channel attention mechanism is added to the mask network in the full connection layer, and the addition of the channel attention mechanism can improve the attention of the target which needs to be identified but is difficult to identify, and improve the accuracy of model identification.

The invention is mainly based on aerial photography military wharf fine-grained target detection in AI rocket military competition projects, the detection targets are obvious in general conditions, the size is larger, the number of the targets is relatively less, therefore, a Mask-RCNN model can obtain a good effect, but in wharf remote sensing images, the images are not very clear, the targets are fuzzy, the sizes are different, and more targets need to be detected, so that the traditional Mask-RCNN recognition effect is not good, and the accuracy of the improved Mask-RCNN model is obviously improved in a wharf remote sensing image fine-grained target detection task. As shown in Table 1, for the traditional Mask-RCNN model, the mAP obtained from the test data in the game project is only 54.765, after Random-Batchimages are added, the mAP value reaches 58.652, after the FPN is further changed into Bi-FPN, the mAP value reaches 64.157, and further a channel attention mechanism is added, the final mAP value reaches 68.227, and the first 20% of good results are obtained in all teams in the game.

In addition, the embodiment of the invention also provides military target identification equipment for the aerial image of the cloud federation, which comprises a memory, a processor and a military target identification program for the aerial image of the cloud federation, wherein the military target identification program is stored in the memory and can be operated on the processor, and the military target identification program for the aerial image of the cloud federation realizes the steps of the military target identification method for the aerial image of the cloud federation when being executed by the processor.

In addition, the specific embodiment of the invention also provides a storage medium, wherein a military target identification method program for the aerial image of the cloud federation is stored on the storage medium, and the military target identification method program for the aerial image of the cloud federation realizes the steps of the military target identification method for the aerial image of the cloud federation when being executed by a processor.

In addition, a military target recognition device for aerial images of cloud federation is further provided in the specific embodiment of the present invention, and the military target recognition device for aerial images of cloud federation includes:

The beneficial effects brought by the specific embodiment of the invention are as follows:

(1) and innovative Random-Batch images processing is added before the input of the original aerial images, the spliced images are obtained and mixed with the original images to be used as input, and subsequent training is carried out, so that the recognition performance of a single small target is improved, and the accuracy of the model is improved to a certain extent.

TABLE 1 comparison of recognition results for various models

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A target object recognition method used in an image of a cloud federation is characterized by comprising the following steps:

inputting the candidate boxes into a ROIAlign network layer, and screening out an interested area;

2. The method for identifying the target object in the image for the cloud federation of claim 1, wherein the performing Random-Batch images processing on the original image to obtain a processed image comprises:

and randomly intercepting 1/4 images containing the target object for each image in the original images, and randomly splicing four 1/4 images containing the target object to obtain processed images.

3. The method for identifying the target object in the image of the cloud federation as claimed in claim 1, wherein before the step of classifying, frame regression and mask network processing the region of interest by the full connectivity layer according to the features of the region of interest to obtain the semantic classification result of the original image so as to identify the target object in the original image, the method further comprises:

and adding a channel attention mechanism to the mask network in the full connection layer.

4. The method for identifying the target object in the image of the cloud federation as claimed in claim 1, wherein the fully connected layer performs classification, frame regression and mask network processing on the region of interest according to the features of the region of interest to obtain a semantic classification result of the original image so as to identify the target object in the original image, and the method comprises the following steps:

inputting the characteristics of the region of interest into a full-connection layer, and classifying the region of interest according to the characteristics of the region of interest to obtain two outputs;

predicting the target object represented by each region of interest through one of the outputs to obtain a target object prediction result;

performing frame regression on the target object represented by each region of interest through another output to obtain a candidate frame matching the size and the position of the target object;

5. The method for identifying the target object in the image of the cloud federation of claim 1, wherein after the step of classifying, frame regression and mask network processing the region of interest by the full connectivity layer according to the features of the region of interest to obtain the semantic classification result of the original image so as to identify the target object in the original image, the method further comprises:

and fine-tuning the super-parameters of the ResNet network based on the accuracy of the semantic classification result, and outputting the optimal semantic classification result of the original image.

6. The method for object recognition in images of the cloud federation of claim 5, wherein the hyper-parameters comprise: at least one of a learning rate, an activation function, and an optimizer.

7. An object recognition device used in an image of a cloud federation, the object recognition device used in the image of the cloud federation comprising:

the feature extraction module is used for fusing the processed image and the original image and inputting the fused image into a ResNet network to perform feature extraction so as to obtain a feature map;

the interested region selection module is used for inputting the feature map with stronger semantic expression capability into a region generation network, generating a plurality of candidate frames, inputting the candidate frames into a ROIAlign network layer and screening out the interested region;