CN114120127B

CN114120127B - Target detection method, device and related equipment

Info

Publication number: CN114120127B
Application number: CN202111445714.XA
Authority: CN
Inventors: 薛帅
Original assignee: Jinan Boguan Intelligent Technology Co Ltd
Current assignee: Jinan Boguan Intelligent Technology Co Ltd
Priority date: 2021-11-30
Filing date: 2021-11-30
Publication date: 2024-06-07
Anticipated expiration: 2041-11-30
Also published as: CN114120127A

Abstract

The application discloses a target detection method, a device and related equipment, which comprise the steps of processing an image to be detected by using a detection model to obtain each detection frame and classification confidence and positioning confidence of each detection frame; obtaining the detection confidence coefficient of the corresponding detection frame according to the classification confidence coefficient and the positioning confidence coefficient; selecting a detection frame with the maximum detection confidence value as a first detection frame, and calculating the comprehensive cross-over ratio between the first detection frame and other detection frames; determining a filtering threshold corresponding to the corresponding detection frame according to the detection confidence, and eliminating other detection frames with the comprehensive cross ratio exceeding the filtering threshold; based on the residual detection frames except the first detection frame, returning to select the detection frame with the maximum detection confidence value as the first detection frame, and calculating the comprehensive cross-over ratio between the first detection frame and other detection frames to carry out detection frame filtering until the residual detection frame does not exist; outputting all the first detection frames. The accuracy of the target detection result can be effectively improved.

Description

Target detection method, device and related equipment

Technical Field

The present application relates to the field of computer vision, and in particular, to a target detection method, and a target detection apparatus, device, and computer readable storage medium.

Background

Target detection is an important task in computer vision, and dense occlusion problem is one of the most challenging problems in detection task. In the practical application scene, detection algorithms are required to be effectively applicable to pedestrians, vehicles, and the like blocked by a mall densely, and urban streets.

Non-maximum suppression (NMS, non-Maximum suppression) is a necessary post-processing in the target detection algorithm to eliminate redundant detection boxes on the same object. However, in a dense occlusion scene, the overlap ratio between targets is higher, the overlap ratio (IoU, intersection over Union, the overlap ratio) is higher, in the processing process of the traditional NMS algorithm, a detection frame larger than the NMS threshold is deleted, and obviously, when the NMS threshold is set lower, the occlusion targets with higher overlap ratio are filtered out, so that the recall rate is reduced; when the NMS threshold is set higher, the detection rate of the shielding target is improved, but the redundant detection frame is insufficiently filtered, and false detection is increased. Therefore, the traditional implementation method has the problem of inaccurate target detection results in a dense shielding scene.

Therefore, how to effectively improve the accuracy of the target detection result is a problem to be solved by those skilled in the art.

Disclosure of Invention

The application aims to provide a target detection method which can effectively improve the accuracy of a target detection result; another object of the present application is to provide an object detection apparatus, a device, and a computer-readable storage medium, each having the above-mentioned advantageous effects.

In a first aspect, the present application provides a target detection method, including:

Processing the image to be detected by using a detection model to obtain each detection frame and the classification confidence and the positioning confidence of each detection frame;

obtaining the detection confidence coefficient of the corresponding detection frame according to the classification confidence coefficient and the positioning confidence coefficient;

selecting a detection frame with the maximum detection confidence value as a first detection frame, and calculating the comprehensive cross-over ratio between the first detection frame and other detection frames;

determining a filtering threshold corresponding to the corresponding detection frame according to the detection confidence, and eliminating other detection frames of which the comprehensive cross ratio exceeds the filtering threshold;

returning the detection frame with the maximum detection confidence value to be selected as a first detection frame based on the residual detection frames except the first detection frame, and calculating the comprehensive cross-over ratio between the first detection frame and other detection frames to carry out detection frame filtering until no residual detection frame exists;

Outputting all the first detection frames.

Preferably, the calculating the comprehensive cross-over ratio between the first detection frame and each other detection frame includes:

calculating the cross ratio of the first detection frame and other detection frames;

calculating the center point distance between the first detection frame and the other detection frames;

calculating a characteristic difference value between the first detection frame and the other detection frames;

and calculating according to the merging ratio, the center point distance and the characteristic difference value to obtain the comprehensive merging ratio.

Preferably, the calculating the characteristic difference value between the first detection frame and the other detection frames includes:

extracting the characteristics of the first detection frame to obtain first detection characteristics;

extracting the characteristics of the other detection frames to obtain second detection characteristics;

and calculating to obtain the characteristic difference value according to the first detection characteristic and the second detection characteristic.

Preferably, the determining the filtering threshold corresponding to the corresponding detection frame according to the detection confidence includes:

when the detection confidence coefficient exceeds a preset threshold value, the first threshold value is used as a filtering threshold value corresponding to the corresponding detection frame;

when the detection confidence coefficient does not exceed the preset threshold value, taking the second threshold value as a filtering threshold value corresponding to the corresponding detection frame; wherein the first threshold is greater than the second threshold.

Preferably, the training process of the detection model includes:

training the algorithm model by using a training image set to obtain a plurality of initial detection models with model loss meeting preset conditions;

and screening all the initial detection models by using a test image set to obtain an optimal detection model, and taking the optimal detection model as the detection model.

Preferably, the model penalty includes a box regression penalty, a confidence penalty, a classification penalty, and a location penalty.

In a second aspect, the present application also discloses an object detection apparatus, including:

The image processing module is used for processing the image to be detected by using the detection model to obtain each detection frame and the classification confidence and the positioning confidence of each detection frame;

the confidence coefficient calculation module is used for calculating and obtaining the detection confidence coefficient of the corresponding detection frame according to the classification confidence coefficient and the positioning confidence coefficient;

The comprehensive cross-over ratio calculation module is used for selecting the detection frame with the maximum detection confidence value as a first detection frame and calculating the comprehensive cross-over ratio between the first detection frame and other detection frames;

The detection frame filtering module is used for determining a filtering threshold corresponding to the corresponding detection frame according to the detection confidence coefficient and rejecting other detection frames with the comprehensive cross ratio exceeding the filtering threshold;

the circulating filtering module is used for returning the detection frame with the maximum detection confidence value to be selected as the first detection frame based on the residual detection frames except the first detection frame, and calculating the comprehensive cross-over ratio between the first detection frame and other detection frames to carry out detection frame filtering until no residual detection frame exists;

And the detection frame output module is used for outputting all the first detection frames.

Preferably, the integrated cross ratio calculating module includes:

the cross-over ratio calculating unit is used for calculating the cross-over ratio of the first detection frame and other detection frames;

a center point distance calculating unit, configured to calculate a center point distance between the first detection frame and the other detection frames;

A feature difference value calculation unit, configured to calculate a feature difference value between the first detection frame and the other detection frames;

and the comprehensive cross-over ratio calculation unit is used for calculating and obtaining the comprehensive cross-over ratio according to the cross-over ratio, the center point distance and the characteristic difference value.

In a third aspect, the present application also discloses an object detection apparatus, including:

A memory for storing a computer program;

A processor for implementing the steps of any one of the object detection methods described above when executing the computer program.

In a fourth aspect, the present application also discloses a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of any of the object detection methods described above.

The application provides a target detection method, which comprises the steps of processing an image to be detected by using a detection model to obtain each detection frame and classification confidence and positioning confidence of each detection frame; obtaining the detection confidence coefficient of the corresponding detection frame according to the classification confidence coefficient and the positioning confidence coefficient; selecting a detection frame with the maximum detection confidence value as a first detection frame, and calculating the comprehensive cross-over ratio between the first detection frame and other detection frames; determining a filtering threshold corresponding to the corresponding detection frame according to the detection confidence, and eliminating other detection frames of which the comprehensive cross ratio exceeds the filtering threshold; returning the detection frame with the maximum detection confidence value to be selected as a first detection frame based on the residual detection frames except the first detection frame, and calculating the comprehensive cross-over ratio between the first detection frame and other detection frames to carry out detection frame filtering until no residual detection frame exists; outputting all the first detection frames.

Therefore, according to the target detection method provided by the application, the prediction branch of the detection frame positioning confidence coefficient is added in the traditional detection model, and the calculation of the detection confidence coefficient of the detection frame is realized by combining the classification confidence coefficient and the positioning confidence coefficient, so that the position of the detection frame with high confidence coefficient can be more accurate; on the basis, a multi-threshold filtering method is set for filtering the redundant detection frames, namely, different filtering thresholds are set for detection confidence degrees of different sizes, so that the problems of false detection and missing detection in a dense shielding scene are effectively optimized, the accuracy of a target detection result is improved, and meanwhile, the applicability of a detection algorithm in the dense shielding scene is improved.

The object detection device, the device and the computer readable storage medium provided by the application have the beneficial effects and are not described in detail herein.

Drawings

In order to more clearly illustrate the technical solutions in the prior art and the embodiments of the present application, the following will briefly describe the drawings that need to be used in the description of the prior art and the embodiments of the present application. Of course, the following drawings related to embodiments of the present application are only a part of embodiments of the present application, and it will be obvious to those skilled in the art that other drawings can be obtained from the provided drawings without any inventive effort, and the obtained other drawings also fall within the scope of the present application.

FIG. 1 is a schematic flow chart of a target detection method according to the present application;

FIG. 2 is a diagram of a model structure of a detection model according to the present application;

FIG. 3 is a schematic diagram illustrating the intersection calculation of two detection frames according to the present application;

FIG. 4 is a schematic diagram of a union calculation of two detection frames according to the present application;

FIG. 5 is a schematic diagram illustrating calculation of a center point distance between two detection frames according to the present application;

FIG. 6 is a schematic diagram of a target detection apparatus according to the present application;

Fig. 7 is a schematic structural diagram of an object detection device according to the present application.

Detailed Description

The core of the application is to provide a target detection method which can effectively improve the accuracy of a target detection result; another core of the present application is to provide an object detection apparatus, a device, and a computer-readable storage medium, which also have the above-mentioned advantageous effects.

In order to more clearly and completely describe the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The embodiment of the application provides a target detection method.

Referring to fig. 1, fig. 1 is a flow chart of a target detection method provided by the present application, where the target detection method may include:

S101: processing the image to be detected by using a detection model to obtain each detection frame and the classification confidence and the positioning confidence of each detection frame;

The method aims at realizing the processing of the image to be detected by using the detection model, and specifically, when the image to be detected is received, the image to be detected is input into the detection model, and the output of the model is a plurality of detection frames detected from the image to be detected, and the classification confidence and the positioning confidence of each detection frame. The detection model is a pre-created neural network model, can be pre-stored in a corresponding storage space, and can be directly called from the storage space when target detection is carried out.

The image to be detected is the image which needs to be detected. The detection frames are bounding boxes of preset detection objects in the image to be detected, the shapes of the detection frames are generally rectangular, the number of the detection frames is not unique, for example, when people need to be detected, human bodies in the image to be detected are the preset detection objects, and the detection frames are bounding boxes of all human bodies in the image to be detected correspondingly; when the vehicle detection is needed, the vehicle in the image to be detected is a preset detection object, and the corresponding detection frame is a bounding box of each vehicle in the image to be detected.

It can be understood that the number of detection frames of the preset detection object obtained after the image to be detected is processed through the detection model is generally multiple, and the target detection method provided by the application aims to screen the detection frame with the most accurate positioning of the preset detection object from the multiple detection frames. The classification confidence coefficient and the positioning confidence coefficient are used for realizing the calculation of the detection confidence coefficient of the corresponding detection frame, and the classification confidence coefficient and the positioning confidence coefficient are fused, so that the accuracy of the position of the detection frame can be improved to a certain extent, and the detection confidence coefficient which is more reasonable and reliable can be obtained conveniently, and the method can be particularly described with reference to S102.

As a preferred embodiment, the training process of the detection model may include: training the algorithm model by using a training image set to obtain a plurality of initial detection models with model loss meeting preset conditions; and screening all the initial detection models by using a test image set to obtain an optimal detection model, and taking the optimal detection model as the detection model.

The present preferred embodiment provides a training method for a detection model, specifically, a sample image set may be acquired in advance and divided into a training image set and a test image set, where the training image set is used for model training, and the test image set is used for model screening. Therefore, the training image set can be utilized to train the initial algorithm model to obtain a plurality of initial detection models with model loss meeting the preset condition, wherein the preset condition can be that the model loss is within the preset threshold and tends to be stable; further, the indexes of each initial detection model are verified by using the test image set, so that the optimal detection model with the optimal index value is obtained through screening.

The process of training the initial algorithm model using the training image set may be implemented based on YOLOv's 4 algorithm (You Only Look Once, an object recognition and localization algorithm based on a deep neural network). It should be noted that, when the conventional YOLO v4 model performs image processing, the output data only includes the detection frames and the classification confidence coefficient of each detection frame, so, in order to implement calculation of the positioning confidence coefficient of the detection frames, a positioning confidence coefficient prediction branch may be added on the basis of the conventional YOLO v4 model, so as to obtain the positioning confidence coefficient of each detection frame.

It will be appreciated that the implementation of the detection model training based on YOLOv's algorithm is only one implementation provided in the preferred embodiment, and the present application is not limited thereto, and for example, SSD (Single Shot MultiBox Detector, an object detection algorithm), RCNN (Region-Convolutional Neural Networks, a Region-based convolutional neural network algorithm) and the like may be used. Similarly, in order to obtain the above various data (the detection frames and the classification confidence and the positioning confidence of each detection frame), corresponding prediction branches can be added on the basis of the conventional algorithm.

As a preferred embodiment, the model penalty may include a box regression penalty, a confidence penalty, a classification penalty, and a location penalty.

It can be understood that the model loss calculation is realized by fusing the above losses, so that the accuracy of the preset detection model can be effectively improved, and the accuracy of the processing result of the image to be detected can be further improved.

S102: obtaining the detection confidence coefficient of the corresponding detection frame according to the classification confidence coefficient and the positioning confidence coefficient;

The step aims to realize the calculation of the detection confidence of the detection frame, namely, the classification confidence and the positioning confidence are fused, and the final detection confidence of the corresponding detection frame is obtained through calculation. The calculation formula of the detection confidence is as follows:

Wherein S _det is a detection confidence, p _m is a classification confidence of the mth grid output target, ioU _m is a IoU positioning confidence of the mth grid output target, and a represents the contribution of the two.

It can be appreciated that when evaluating the accuracy of the detection frame, the final detection confidence level fused with the classification confidence level and the IoU positioning confidence level is adopted, so that the detection frame is more reasonable and reliable compared with the detection frame which only depends on the classification confidence level.

S103: selecting a detection frame with the maximum detection confidence value as a first detection frame, and calculating the comprehensive cross-over ratio between the first detection frame and other detection frames;

Specifically, after the detection confidence coefficient of each detection frame is calculated, the detection frame corresponding to the detection confidence coefficient with the largest value can be selected as the first detection frame, and the comprehensive cross-over ratio between the detection frame and other detection frames is calculated. The comprehensive merging ratio is the merging ratio fused with multidimensional information, and more accurate redundant detection frame rejection can be realized.

It can be understood that when the detection confidence is higher, the accuracy of the corresponding detection frame is higher, so that the detection frame with the highest detection confidence value is used as the first detection frame, and the cross-over ratio calculation is performed with other detection frames, so that the detection frames with the comprehensive cross-over ratio exceeding the corresponding threshold value can be filtered, and the detection frames with low accuracy can be filtered.

As a preferred embodiment, the calculating the integrated cross-over ratio between the first detection frame and each other detection frame may include: calculating the cross-over ratio of the first detection frame to other detection frames; calculating the center point distance between the first detection frame and other detection frames; calculating a characteristic difference value between the first detection frame and other detection frames; and calculating according to the cross-over ratio, the center point distance and the characteristic difference value to obtain the comprehensive cross-over ratio.

The preferred embodiment provides a method for calculating the comprehensive cross-over ratio between the detection frames, namely, the difference of the center point distance and the appearance characteristics between the two detection frames is introduced on the basis of the traditional IoU. Specifically, the detection frame intersection ratio, the center point distance and the characteristic difference value between the first detection frame and other detection frames can be calculated first, so that the comprehensive intersection ratio between the first detection frame and the other detection frames is obtained based on the detection frame intersection ratio, the center point distance and the characteristic difference value.

It can be appreciated that the inventor introduces the center point distance into the traditional IoU, not only considers the coincidence ratio between the two detection frames, but also focuses on the distance between the center points of the two detection frames, because the coincidence ratio of the detection frames alone cannot effectively distinguish the multiple detection frames from the shielding frames, the shielding frames are often only partially coincident with the target frames, but the positions of the center points are necessarily different, so that the distinction ratio of the multiple detection frames and the shielding frames can be enhanced by introducing the center point distance. On the basis, feature difference values are further introduced, namely, feature comparison can be performed on the two detection frames, because appearance features of the detection frames of the same target object are more similar, the detection frames are restrained, the difference of the appearance features of the detection frames of different target objects is larger, and the detection frames are reserved.

As a preferred embodiment, the calculating the center point distance between the first detection frame and the other detection frames may include: calculating Euclidean distances between the center point of the first detection frame and the center points of other detection frames; determining the minimum circumscribed rectangle of the first detection frame and other detection frames, and calculating the diagonal length of the minimum circumscribed rectangle; the center point distance is calculated from the Euclidean distance and the diagonal length.

The preferred embodiment provides a method for calculating the distance between center points of two detection frames, namely, the method is obtained by calculating the Euclidean distance between center points of the two detection frames and the diagonal length of the minimum circumscribed rectangle of the two detection frames, and the calculation formula is as follows:

Wherein Distance is the center Distance between the detection frame M and the detection frame b _k, ρ (M, b _k) is the euclidean Distance between the detection frame M and the detection frame b _k, and c is the diagonal length of the smallest circumscribed rectangle of the detection frame M and the detection frame b _k.

As a preferred embodiment, the calculating the characteristic difference value between the first detection frame and the other detection frames may include: extracting features of the first detection frame to obtain first detection features; extracting features of other detection frames to obtain second detection features; and calculating to obtain a characteristic difference value according to the first detection characteristic and the second detection characteristic.

The preferred embodiment provides a method for calculating a feature difference value between detection frames, that is, the feature difference value is obtained by extracting features of the detection frames and calculating a difference value between two extracted features (namely, the first detection feature and the second detection feature), and the feature difference value can be specifically a feature vector included angle of the two extracted features. The feature extraction of the detection frame may be implemented by using a corresponding feature extraction algorithm, such as ReID (Person Re-identification, pedestrian Re-identification) feature extraction network, resNet feature extraction network (Residual Neural Network, a CNN feature extraction network), VGG feature extraction network (Visual Geometry Group, a convolutional neural network), and the like.

Based on the above several preferred embodiments, a comprehensive cross ratio calculation formula between two detection frames can be obtained:

Wherein feature_ IoU is the comprehensive intersection ratio between the detection frame M and the detection frame b _k, ioU (M, b _k) is the detection frame intersection ratio between the detection frame M and the detection frame b _k, and θ is the Feature vector included angle between the detection Feature of the detection frame M and the detection Feature of the detection frame b _k.

Further, the calculation formula of the detection frame intersection ratio IoU (M, b _k) is as follows:

wherein M.u.b _k is the intersection of detection frame M and detection frame b _k, and M.u.b _k is the union of detection frame M and detection frame b _k.

S104: determining a filtering threshold corresponding to the corresponding detection frame according to the detection confidence, and eliminating other detection frames with the comprehensive cross ratio exceeding the filtering threshold;

The method aims at realizing detection frame filtering, namely removing the detection frames with the comprehensive cross ratio exceeding a filtering threshold, wherein the filtering threshold can be determined according to the detection confidence of the corresponding detection frames. As described above, in the conventional NMS algorithm processing, the detection frame greater than the NMS threshold is deleted, but when the NMS threshold is set lower in the post-filtering processing of the detection result for the target in the dense occlusion scene, the occlusion target with higher overlap ratio is filtered out because the overlap ratio is higher than the threshold, and is mistakenly considered as the redundant detection frame of the same target, resulting in a decrease in the actual target recall rate; when the NMS threshold is set higher, the detection rate of the shielding target is improved, but the filtering of the redundant detection frame is insufficient, so that false detection is increased. In other words, when NMS filters, the more distant from the redundant box of the real target, the smaller the filtering threshold is needed to filter it; the closer to the redundant frame of the real target, the larger the filtering threshold value is, and the filtering of the redundant frame is avoided while the filtering of the detection frame of the adjacent real target is avoided.

In order to solve the above problem, a measurement score capable of reflecting the position deviation degree of the detection frame, namely the position confidence is added into the confidence, so that the requirement of filtering by adopting different thresholds according to the size of the deviation from the real target can be realized. Specifically, filtering thresholds with different sizes can be set for detection confidence coefficients with different sizes, when the detection confidence coefficient is larger in value, a filtering threshold with larger value can be set, and when the detection confidence coefficient is smaller in value, a filtering threshold with smaller value can be set, so that a multi-threshold-based detection frame filtering method is realized, a filtering result is more accurate, and accuracy of a target detection result is further improved.

As a preferred embodiment, the determining the filtering threshold corresponding to the corresponding detection frame according to the detection confidence may include: when the detection confidence exceeds a preset threshold, taking the first threshold as a filtering threshold corresponding to the corresponding detection frame; when the detection confidence coefficient does not exceed the preset threshold value, the second threshold value is used as a filtering threshold value corresponding to the corresponding detection frame; wherein the first threshold is greater than the second threshold.

The preferred embodiment provides a method for determining a filtering threshold, and detection frame filtering can be achieved based on two filtering thresholds. Specifically, a preset threshold value may be set in advance for the detection confidence coefficient, and when the detection confidence coefficient exceeds the preset threshold value, the first threshold value is used as a filtering threshold value, and when the detection confidence coefficient does not exceed the preset threshold value, the second threshold value is used as a filtering threshold value, and obviously, the first threshold value is necessarily larger than the second threshold value. It is understood that the specific value of each threshold does not affect the implementation of the technical scheme, and the technical scheme is set by a technician according to actual situations, which is not limited by the present application.

S105: judging whether the number of the residual detection frames is 0; if not, executing S106, if yes, executing S107;

S106: determining the remaining detection frames except the first detection frame, and returning to S103;

s107: outputting all the first detection frames.

The above steps aim at realizing the cyclic filtering of the detection frames until the actual detection frames of the preset detection objects, namely the first detection frames, are obtained. Specifically, after finishing a round of filtering of the detection frames based on S103 and S104, whether the number of the remaining detection frames is 0 can be judged, if yes, it is indicated that the redundant detection frames are filtered, all the reserved first detection frames are actual detection frames of the preset detection objects, and at this time, all the first detection frames are directly output; otherwise, if the number of the remaining detection frames is not 0, the process returns to S103 to continue the loop filtering until the number of the remaining detection frames is 0, and filtering all the redundant detection frames. Thus, target detection is completed.

Based on the above embodiments, the embodiment of the present application provides another target detection method, which specifically includes the following implementation procedures:

Step 1, sample image acquisition and pretreatment:

Image data are collected, the collected images are calibrated, a detection frame is marked, and meanwhile, preprocessing such as data enhancement and expansion is carried out on the collected images, so that the whole image set is divided into a training image set and a test image set.

Step 2, clustering the training image set:

And clustering each image in the training image set by using a K-means clustering algorithm to obtain the most likely detection target shape serving as an anchor of the model.

Step 3, model structural design:

As shown in fig. 2, fig. 2 is a model structure diagram of a detection model provided by the present application, that is, a IoU score prediction branch is added to a prediction part YOLOv to implement prediction of IoU positioning confidence. The backup is a Backbone network, the Head is a network for obtaining network output content, and the back is arranged between the backup and the Head so as to better utilize the characteristics extracted by the backup; in the output part, classification is Classification confidence, bbox is a detection frame, objectness is a detection frame containing a score of a preset detection object, and Iou score prediction is IoU positioning confidence.

Step 4, designing a loss function:

the loss function may specifically include the following losses:

(1) Regression loss of prediction box:

(2) Confidence loss:

(3) Classification loss:

(4) Loss of position:

Thus, the overall loss function:

L_total＝L_loc+L_objness+L_class+L_IoU

K is the total number of output layer grids, each grid generates M candidate frames, and each candidate passes through the grid to obtain a corresponding binding box;

indicating whether the jth anchor box of the ith grid is responsible for predicting this object, if so Otherwise, 0;

the j-th anchor box representing the i-th grid is not responsible for predicting the target;

whether a certain object is predicted by the sounding box of the grid is determined, if yes, the value is 1, and if no, the value is 0;

When the ith grid and the jth anchor box are responsible for a certain real target, the real class of the target is represented, if the target belongs to the class c, the value is 0, otherwise, the value is 1;

IoU representing the mesh predicted bounding box and the real object.

Step 5, model training:

training the training image set based on the model structure in the step 3, adopting the total loss function in the step 4 as the loss function, testing the current model by using the test image set after the model loss is reduced to be stable, and selecting the weight with the best performance on the test data set as the final model, namely the detection model.

Step 6, image target detection:

(1) Inputting the image to be detected into a detection model to obtain detection frames, and classifying confidence degrees and IoU position confidence degrees of the detection frames;

(2) The detection confidence of each detection frame is obtained by adopting the following formula:

(3) Selecting a detection frame M with the maximum detection confidence coefficient value, and calculating the comprehensive cross-over ratio between the detection frame M and other detection frames b _k:

First, referring to fig. 3 and fig. 4, fig. 3 is a schematic diagram of intersection calculation of two detection frames provided by the present application, and fig. 4 is a schematic diagram of union calculation of two detection frames provided by the present application, then there is a detection frame intersection ratio between the detection frame M and the detection frame b _k:

Further, on the basis of IoU (M, b _k), introducing the center point distances of the two detection frames, as shown in fig. 5, fig. 5 is a schematic diagram of calculating the center point distances of the two detection frames, where d is the center point distance, and c is the diagonal length of the minimum circumscribed rectangle of the two detection frames, and then:

Wherein Distance is the difference value of the center point between the detection frame M and the detection frame b _k, ρ (M, b _k) is the euclidean Distance between the detection frame M and the detection frame b _k, and c is the diagonal length of the minimum circumscribed rectangle of the detection frame M and the detection frame b _k.

Finally, on the basis of DIoU, the appearance characteristic difference between the two detection frames is introduced, and then:

Feature_IoU＝DIoU+|cosθ|-1

and θ is a feature vector included angle between the detection feature of the detection frame M and the detection feature of the detection frame b _k.

(4) Determining a filtering threshold according to the detection confidence of each detection frame, and filtering the detection frames based on the filtering threshold, wherein the filtering rule is as follows:

if Feature_IoU≥N_t1 and S_det≥C_t，then

B←B-b_k;S←S-S_det

else if Feature_IoU≥N_t2 and S_det＜C_t，then

B←B-b_k;S←S-S_det

(5) And (3) continuously selecting the detection frame with the maximum detection confidence value as the detection frame M in the rest detection frames, returning to the comprehensive cross-correlation ratio calculation in the step (3) and the detection frame filtering in the step (4), and so on until the filtering of the redundant detection frames is realized through cyclic filtering, and obtaining all the first detection frames as the actual detection frames of the preset detection objects.

Therefore, according to the target detection method provided by the embodiment of the application, the prediction frame positioning confidence coefficient output is added in the detection model, so that the model can express the positioning confidence coefficient while predicting the detection result classification confidence coefficient; at the NMS processing stage, DIoU fused with appearance characteristics is introduced, so that detection frames of different targets can be distinguished more easily; and a multi-threshold filtering strategy is adopted, so that the recall rate and the accuracy rate of the detection result are better considered. Obviously, the method can effectively improve the target detection effect in dense scenes under the condition of adding a small amount of time consumption.

In order to solve the above-mentioned technical problems, the present application further provides a target detection device, please refer to fig. 6, fig. 6 is a schematic structural diagram of the target detection device provided by the present application, the target detection device may include:

the image processing module 1 is used for processing the image to be detected by using the detection model to obtain each detection frame and the classification confidence and the positioning confidence of each detection frame;

The confidence coefficient calculation module 2 is used for calculating and obtaining the detection confidence coefficient of the corresponding detection frame according to the classification confidence coefficient and the positioning confidence coefficient;

The comprehensive cross-over ratio calculation module 3 is used for selecting a detection frame with the maximum detection confidence value as a first detection frame and calculating the comprehensive cross-over ratio between the first detection frame and other detection frames;

The detection frame filtering module 4 is used for determining a filtering threshold corresponding to the corresponding detection frame according to the detection confidence and rejecting other detection frames with the comprehensive cross ratio exceeding the filtering threshold;

The circulation filtering module 5 is used for returning to select the detection frame with the maximum detection confidence value as the first detection frame based on the residual detection frames except the first detection frame, and calculating the comprehensive cross-over ratio between the first detection frame and other detection frames to carry out detection frame filtering until the residual detection frame does not exist;

and the detection frame output module 6 is used for outputting all the first detection frames.

Therefore, according to the target detection device provided by the embodiment of the application, the prediction branch of the detection frame positioning confidence is added in the traditional detection model, and the calculation of the detection confidence of the detection frame is realized by combining the classification confidence and the positioning confidence, so that the position of the detection frame with high confidence can be more accurate; on the basis, a multi-threshold filtering method is set for filtering the redundant detection frames, namely, different filtering thresholds are set for detection confidence degrees of different sizes, so that the problems of false detection and missing detection in a dense shielding scene are effectively optimized, the accuracy of a target detection result is improved, and meanwhile, the applicability of a detection algorithm in the dense shielding scene is improved.

As a preferred embodiment, the above-mentioned integrated cross ratio calculating module 3 may include:

the cross-over ratio calculating unit is used for calculating the cross-over ratio of the first detection frame to other detection frames;

The center point distance calculating unit is used for calculating the center point distance between the first detection frame and the other detection frames;

the characteristic difference value calculation unit is used for calculating characteristic difference values between the first detection frame and other detection frames;

a comprehensive cross-over ratio calculation unit for calculating and obtaining comprehensive cross-over ratio according to the cross-over ratio, the center point distance and the characteristic difference value

As a preferred embodiment, the above-mentioned feature difference value calculating unit may be specifically configured to perform feature extraction on the first detection frame to obtain a first detection feature; extracting features of other detection frames to obtain second detection features; and calculating to obtain a characteristic difference value according to the first detection characteristic and the second detection characteristic.

As a preferred embodiment, the above detection frame filtering module 4 may be specifically configured to take the first threshold as a filtering threshold corresponding to the corresponding detection frame when the detection confidence exceeds a preset threshold; when the detection confidence coefficient does not exceed the preset threshold value, the second threshold value is used as a filtering threshold value corresponding to the corresponding detection frame; wherein the first threshold is greater than the second threshold.

As a preferred embodiment, the target detection device may further include a model training module, configured to train the algorithm model by using the training image set, to obtain a plurality of initial detection models with model loss meeting a preset condition; and screening all the initial detection models by using the test image set to obtain an optimal detection model, and taking the optimal detection model as a detection model.

For the description of the device provided by the present application, please refer to the above method embodiment, and the description of the present application is omitted herein.

In order to solve the above technical problems, the present application further provides a target detection device, please refer to fig. 7, fig. 7 is a schematic structural diagram of the target detection device provided by the present application, and the target detection device may include:

A memory 10 for storing a computer program;

The processor 20 is configured to execute the computer program to implement the steps of any one of the object detection methods described above.

For the description of the apparatus provided by the present application, please refer to the above method embodiment, and the description of the present application is omitted herein.

In order to solve the above-mentioned problems, the present application also provides a computer-readable storage medium having a computer program stored thereon, which when executed by a processor, implements the steps of any one of the object detection methods described above.

The computer readable storage medium may include: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

For the description of the computer-readable storage medium provided by the present application, refer to the above method embodiments, and the disclosure is not repeated here.

In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The technical scheme provided by the application is described in detail. The principles and embodiments of the present application have been described herein with reference to specific examples, the description of which is intended only to facilitate an understanding of the method of the present application and its core ideas. It should be noted that it will be apparent to those skilled in the art that the present application may be modified and practiced without departing from the spirit of the present application.

Claims

1. A method of detecting an object, comprising:

Outputting all the first detection frames.

2. The method according to claim 1, wherein calculating the integrated cross-over ratio between the first detection frame and each of the other detection frames includes:

3. The method according to claim 2, wherein the calculating the feature difference value between the first detection frame and the other detection frames includes:

4. The method of claim 1, wherein determining the filtering threshold corresponding to the corresponding detection box according to the detection confidence comprises:

5. The method of claim 1, wherein the training process of the detection model comprises:

6. The method of claim 5, wherein the model penalty comprises a box regression penalty, a confidence penalty, a classification penalty, and a location penalty.

7. An object detection apparatus, comprising:

8. The object detection device of claim 7, wherein the integrated cross ratio calculation module comprises:

9. An object detection apparatus, characterized by comprising:

A memory for storing a computer program;

a processor for implementing the steps of the object detection method according to any one of claims 1 to 6 when executing the computer program.

10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the object detection method according to any of claims 1 to 6.