Background
Obtaining the relative position and the relative attitude (hereinafter referred to as relative pose) between the spacecrafts is an important premise for ensuring the smooth performance of space interaction tasks such as spacecraft pointing tracking and the like, and the detection and identification technology of local components (solar wings, antennas and the like) of the spacecrafts is a key technology. The method has the advantages that the category of the target member is accurately identified, the high-level characteristic information of the local members such as the edge contour, the corner point and the size of the local members of the spacecraft is effectively detected, and powerful data support can be provided for accurate estimation of the relative pose between the spacecrafts.
When a target detection task is performed on a spacecraft in space, two main characteristics and difficulties exist: 1. the on-orbit spacecraft is in a real-time and high-speed motion state, the attitude of the on-orbit spacecraft is constantly changed, and the problems that the overall shape and the size of the spacecraft are greatly changed, the components are partially shielded and the like often exist. 2. The problems of serious noise pollution, low contrast, low average brightness and the like of the space image imaging image can be caused by the influences of continuous change of space illumination conditions, shaking of imaging load, image quality degradation of an imaging system and the like. The above problems can greatly limit the detection accuracy of the spacecraft local member.
Referring to fig. 1, a detection model of the centrmask is provided, which is composed of a feature extractor, a target detection head, and a mask generation head. The backbone network of the feature extractor is VoVNetV2, a residual structure and an eSE (Effective squeee-attention) attention module are merged into the VoVNet, and a target detection head consists of a category prediction branch, a centrality prediction branch and a boundary box regression branch; at the same time, the authors of the centrmask also proposed a spatial attention mechanism SAM for directing mask generation branches to highlight pixels with valid information and to suppress pixels without valid information. In the CenterMask, the feature extractor first performs 6-layer down-sampling of an input image and outputs feature maps (P3 to P7). Then, the detector performs class prediction, bounding box regression and centrality prediction on the feature map output by each layer. Finally, generating the head through the mask can obtain an image segmentation result. However, the centrmask introduces an attention mechanism in the mask generation header, but it still has insufficient attention to the inter-channel information, thereby affecting the accuracy of detection.
Therefore, it is desirable to provide an improved detection model.
Disclosure of Invention
The invention aims to provide a spacecraft multi-local component detection method based on an example segmentation network, which can more accurately obtain the category information, the boundary frame information and the outline information of a target component, and further improve the precision and the speed of component detection.
In order to achieve the above object, the present invention provides a method for detecting multiple local components of a spacecraft based on an example segmentation network, comprising the following steps:
the method comprises the following steps: performing feature extraction on the spacecraft input image to obtain feature maps S3-S5;
step two; inputting S3-S5 into a backbone network FPN structure to obtain multi-scale feature maps P3-P6, wherein the process is defined as follows:
P5=Conv1×1(S5)
P6=Maxpooling(P5)
Pi=Upsample(Si+1)+Conv1×1(Si),i=3,4
wherein, Conv1 × 1 represents a convolution layer with a convolution kernel size of 1 × 1, and Upesample represents a non-linear upsampling layer;
step three: inputting the multi-scale feature maps P3-P6 into a target detection head to detect the local components of the spacecraft, thereby obtaining the prediction results of the type, the bounding box and the centrality of the target;
step four: optimizing the bounding box according to the category and the centrality information to obtain an optimized bounding box;
step five: and inputting the optimized bounding box into a mask generation branch for example segmentation to obtain mask information of the local component of the spacecraft, and further obtaining contour information of the component.
Wherein, step one specifically includes: performing feature extraction on the input image by using VoVNet v2 to obtain a feature map S3~S5。
Wherein, the fourth step specifically comprises: and multiplying the classification fraction of each bounding box obtained in the step three by the predicted centrality to obtain a final evaluation fraction value of each bounding box, wherein the fraction value of the bounding box far away from the center of the object is lower, the fraction value of the bounding box near the center of the object is higher, and the fraction values are used for sorting and screening each bounding box to obtain the optimized bounding box.
In the fourth step, the bounding box is screened in a non-maximum suppression method mode.
Wherein, step five specifically includes: inputting the optimized bounding box feature map obtained in the step four into an SCAM mask branch to obtain an information-reinforced feature map; then, the class of each pixel is predicted by using 1 × 1 convolution, a mask of a specific class is generated, and after mask information of a spacecraft local component is obtained with the dimension of 28 × 28 × 2, contour information of the component can be further obtained.
The invention has the following beneficial effects: the detection method can be used for detecting the category information, the boundary box information and the mask information of the local member of the target spacecraft under the condition of less sample number. Firstly, the feature extraction layer of the anchor-frame-free detector Fcos is reduced, the relevance between the centrality prediction branch and the boundary frame prediction branch is enhanced, and the component detection precision is improved. Then, a space-channel attention mechanism is designed and introduced into a mask generation branch of the CenterMask, and the segmentation precision of the building block is improved. The experiment result shows that the provided local component detection and segmentation method has better performance in detection precision and speed than the original network model.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The invention provides a spacecraft multi-local component detection method based on an example segmentation network, which comprises the following steps:
the method comprises the following steps: performing feature extraction on the spacecraft input image to obtain feature maps S3-S5;
step two; inputting S3-S5 into a backbone network FPN structure to obtain multi-scale feature maps P3-P6, wherein the process is defined as follows:
P5=Conv1×1(S5)
P6=Maxpooling(P5)
Pi=Upsample(Si+1)+Conv1×1(Si),i=3,4
wherein, Conv1 × 1 represents a convolution layer with a convolution kernel size of 1 × 1, and Upesample represents a non-linear upsampling layer;
step three: inputting the multi-scale feature maps P3-P6 into a target detection head to detect the local components of the spacecraft, thereby obtaining the prediction results of the type, the bounding box and the centrality of the target;
step four: optimizing the bounding box according to the category and the centrality information to obtain an optimized bounding box;
step five: and inputting the optimized bounding box into a mask generation branch for example segmentation to obtain mask information of the local component of the spacecraft, and further obtaining contour information of the component.
In a specific embodiment, the invention firstly uses the VoVNet v2 to perform five-stage feature extraction on an input image, and outputs feature maps S2-S5. In order to reduce the amount of calculation of the model and improve the detection efficiency, only S3 to S5 are input into the FPN structure. The high resolution signature S3 may enable the detector to better detect small components, such as antennas. The signature S5 with a wider field of view enables the detector to better detect large members, such as the solar wing.
In a specific embodiment, the invention constructs a CNN-based detection model SCD (satellite components detection), and initializes model parameters by using parameters in a CenterMask pre-trained in an MS-COCO dataset, and initializes different parameters in the CenterMask and SCD by using a standard normal distribution. Model training, namely transfer learning, is then performed on the constructed small sample training set. And finally, obtaining the SCD model after training optimization. By the detection model, the automatic and accurate detection of the local component of the target spacecraft can be realized, and the category, the boundary box and the mask information of the target component are obtained.
In one specific embodiment, the local component detector structure is mainly composed of three parts, namely a backbone network, a feature pyramid and a target detection head. In the embodiment, the VoVNet v2 is used as a backbone network, and a residual structure and an eSE attention module are merged into the VoVNet v2, so that not only can multiple perception fields be efficiently captured, but also the interdependence relation between channels mapped by features can be clarified, and the feature information representation can be enhanced. The details of the backbone network are shown in table 1. The target detection head consists of three branches, namely a classification prediction branch, a bounding box regression branch and a centrality prediction branch. The classification prediction branch is used for predicting confidence degrees of a plurality of classes, the class with the highest confidence degree is used as a prediction class, and the boundary box regression branch is used for predicting four offset values of four boundaries (a left boundary, a right boundary, a top part and a bottom part) of a boundary box relative to a certain position. Since the centrality is correlated with the offset of the bounding box, that is, when the offset of the bounding box is close to the true value, the accurate centrality can be obtained through the offset, and there is basically no correlation with the classification task, in this embodiment, the centrality prediction header is parallel to the regression branch of the bounding box, which not only can strengthen the correlation between the two, but also can reduce the model parameters by sharing the convolution layer.
TABLE 1 backbone network architecture
In one specific embodiment, the target component optimization bounding box detection flow is as follows. During prediction, the detected category fraction, the centrality and the bounding box information can be obtained through the feature extraction structure and the target detection head. Then, the classification fraction of each bounding box is multiplied by the predicted centrality to obtain a final evaluation fraction value of each bounding box, the fraction value of the bounding box positioned far away from the center of the object is lower, the fraction value of the bounding box positioned close to the center of the object is higher, each bounding box is sorted by using the fraction values, and then a Non-Maximum Suppression (NMS) method is used for screening to obtain an optimized bounding box. The method can obviously improve the target detection performance of the model, input the optimized bounding boxes into the mask generation branch for example segmentation, and further improve the segmentation precision.
In one specific embodiment, the spatial-channel attention mechanism enhances/suppresses attention mechanisms with different strengths in the channel dimension, and retains more channel feature information aiming at directing the mask generation branch to focus on objects with salient features between different channels.
In a specific embodiment, the dimension of an input feature map of a Spatial-channel attention module (SCAM) is W × H × C, after single-layer convolution, an attention map with the dimension of W × H × C is generated, after Sigmoid activation, elements in the attention map are mapped to [0,1], and finally, the activated attention map is multiplied by an input original feature map pixel by pixel, so that features rich in information are enhanced, and features without effective information are suppressed.
In a specific embodiment, in a prediction stage, the feature map with a high-quality bounding box is firstly input into an SCAM mask branch to obtain an information-reinforced feature map; then, the class of each pixel is predicted by using 1 × 1 convolution, a mask of a specific class is generated, and the dimension of the mask is 28 × 28 × 2, so that the spacecraft local member segmentation task is completed.
Table 2 shows comparison of detection and segmentation performance of various models, and in terms of detection accuracy, the SCD provided by the present invention can achieve optimal APbox (average accuracy rate of bounding box detection) and APmask (average accuracy rate of Mask detection), which are respectively increased by 2.5% and 1.5% compared with cm (centermask), and increased by 4.9% and 2.5% compared with MR (Mask R-CNN). It is substantially consistent with both methods in terms of detection speed. The SU-SCD (Speed-up SCD) provided by the invention is higher than CM-Lite (CenterMask-Lite) in the detection Speed and accuracy of the solar wing and the antenna. Compared with CM-Lite, SU-SCD is improved by 1.8% and 0.8% respectively on APbox and APmask, and simultaneously, the speed is improved by 0.5FPS, compared with CM and MR, the SU-SCD is improved by 5FPS in speed.
TABLE 2 multiple model detection and segmentation Performance comparison
The invention has the following beneficial effects: the detection method can be used for detecting the category information, the boundary box information and the mask information of the local member of the target spacecraft under the condition of less sample number. Firstly, the feature extraction layer of the anchor-frame-free detector Fcos is reduced, the relevance between the centrality prediction branch and the boundary frame prediction branch is enhanced, and the component detection precision is improved. Then, a space-channel attention mechanism is designed and introduced into a mask generation branch of the CenterMask, and the segmentation precision of the building block is improved. The experiment result shows that the provided local component detection and segmentation method has better performance in detection precision and speed than the original network model.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.