CN113269197A

CN113269197A - Certificate image vertex coordinate regression system and identification method based on semantic segmentation

Info

Publication number: CN113269197A
Application number: CN202110451208.5A
Authority: CN
Inventors: 戚朕; 章水鑫; 周源赣
Original assignee: Nanjing Sanbaiyun Information Technology Co ltd
Current assignee: Nanjing Sanbaiyun Information Technology Co ltd
Priority date: 2021-04-25
Filing date: 2021-04-25
Publication date: 2021-08-17
Anticipated expiration: 2041-04-25
Also published as: CN113269197B

Abstract

The invention discloses a certificate image vertex coordinate regression system based on semantic segmentation, which comprises a coordinate regression module, wherein the coordinate regression module is arranged behind the semantic segmentation module and mainly uses a plurality of semantic segmentation networks as backbone networks by using the coordinate regression module to directly obtain the polygon vertex coordinates of a certificate area, and the coordinate regression module finally uses the maximum index in a thermodynamic diagram to obtain the coordinates of each vertex.

Description

Certificate image vertex coordinate regression system and identification method based on semantic segmentation

The technical field is as follows:

the invention belongs to the technical field of image target detection, and particularly relates to a certificate image vertex coordinate regression system and a certificate image vertex coordinate identification method based on semantic segmentation.

Background art:

the certificate image semantic segmentation is mainly characterized in that manual labeling is carried out by collecting certain certificate pictures (including but not limited to motor vehicle registration certificates, driving certificates and contract equivalents), sub-regions of different types are divided in the pictures, and a deep neural network semantic segmentation model is trained by using the pictures and labeled data. When the method is applied, a new picture is input into the model, and the segmentation result of the target sub-region is automatically obtained. The size of the segmentation result is the same as that of the input picture, wherein the value of each pixel point is indexed by the type (such as certificate or background) of the pixel point, and the vertex coordinates of the sub-region are calculated by an image post-processing algorithm (such as OpenCV (open computer vision library) to calculate the minimum circumscribed polygon, and the structure is shown in FIG. 1. The defects of the conventional technology are specifically: (1) when a general semantic segmentation model is used for segmentation of a certificate area, post-processing operations such as noise removal, minimum circumscribed rectangle solving and the like are needed to obtain the polygon vertex coordinates of the certificate, and the post-processing operations cannot be accelerated by a GPU together with a neural network and are easily interfered by a segmentation result; (2) when the number and the quality of the training pictures are insufficient, a model with strong generalization capability cannot be obtained, and the anti-blocking capability is limited; (3) using multi-class masks as a single label makes it difficult for a simple model to learn accurate edge and vertex information.

The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.

The invention content is as follows:

the invention aims to provide a certificate image vertex coordinate regression system based on semantic segmentation, thereby overcoming the defects in the prior art.

In order to achieve the purpose, the invention provides a certificate image vertex coordinate regression system based on semantic segmentation, which comprises a coordinate regression module, wherein the coordinate regression module is arranged behind the semantic segmentation module.

Preferably, in the above technical solution, the semantic segmentation module performs semantic segmentation according to the backbone network output structure by using a semantic segmentation model;

and the coordinate regression module is used for directly obtaining all vertex coordinates of the certificate area through network calculation.

Preferably, in the above technical solution, the semantic segmentation module adopts deep lab, UNet, PSPNet or other semantic segmentation models.

Preferably, in the above technical solution, the coordinate regression module includes a feature map convolution module, a mask attention module, an attention fusion module, and a thermodynamic diagram calculation module;

the feature map convolution module is communicated with a network layer used for calculating a final segmentation result in the semantic segmentation module, comprises a plurality of 3x3 convolution layers, ReLu activation layers and Batch Normalization layers, and obtains an output result which has the same width and height as the input feature map and has the number of channels as the number of regression vertex coordinates, and sends the output result to the attention fusion module;

the mask attention module is communicated with a semantic segmentation result output end in the semantic segmentation module, global pooling is carried out after a plurality of 3x3 convolutions, ReLu activation and Batch Normalization layers, a middle feature layer with the width and height of 1 and the number of channels as the number of vertex-return coordinates is obtained, and an output result of the module is obtained after passing through two full connection layers and is transmitted to the attention fusion module;

the attention fusion module performs pixel-by-pixel addition on the output characteristic graphs of the characteristic graph convolution module and the mask attention module, and then obtains an output result through a sigmoid activation layer and outputs the output result to the thermodynamic diagram calculation module;

the input end of the thermodynamic diagram calculation module is in communication connection with the output end of the attention fusion module, the width and the height of the thermodynamic diagram calculation module are the same as those of the semantic segmentation network feature diagram, the number of channels is the number of regression vertex coordinates, and each channel of the feature diagram represents a thermodynamic diagram of each vertex; and (3) directly performing loss calculation on the input and the label during training, and calculating the maximum value index of each channel through an argmax function during reasoning, namely each predicted vertex coordinate.

A certificate image vertex coordinate regression recognition method based on semantic segmentation uses a coordinate regression module to take various semantic segmentation networks as backbone networks to directly obtain polygon vertex coordinates of a certificate area, and the coordinate regression module finally uses a maximum value index in a thermodynamic diagram to obtain the coordinates of each vertex.

Preferably, in the above technical solution, the semantic segmentation network features and the semantic segmentation results are given by a general semantic segmentation model, and the coordinate regression module is disposed behind the semantic segmentation model.

Preferably, in the above technical solution, the feature graph convolution module semantically segments network features to obtain an output to the next process, where the width and height are the same as those of the input feature graph, and the number of channels is the number of regression vertex coordinates;

the mask attention module performs global pooling on semantic segmentation results after a plurality of 3x3 convolutions, ReLu activation and Batch Normalization layers to obtain a middle feature layer with the width and height of 1 and the number of channels as the number of vertex coordinates, and outputs the semantic segmentation results to the next process after passing through two full connection layers;

the attention fusion module performs pixel-by-pixel addition on the output characteristic graphs of the characteristic graph convolution module and the mask attention module, and then obtains an output result through a sigmoid activation layer to the next process;

the thermodynamic diagram calculation module obtains an output result of the attention fusion module, the width and the height of the output result are the same as those of a semantic segmentation network feature diagram, the number of channels is the number of regression vertex coordinates, and each channel of the feature diagram represents a thermodynamic diagram of each vertex; directly performing loss calculation on the input and the label during training, and calculating the maximum index of each channel through an argmax function during reasoning, namely predicting each vertex coordinate;

the labels of the coordinate regression module are vertex thermodynamic diagrams, and can be automatically generated from the vertexes of the minimum circumscribed polygon according to the existing semantic segmentation label diagram before training; the Smooth L1 is used as a loss function to calculate the loss of the label and network output thermodynamic diagrams during training. The coordinate regression module can be used as an improvement module to carry out independent training, and can also be used for carrying out weighted calculation together with a loss function of the original semantic segmentation model. The coordinate thermodynamic diagram has content independence and can provide anti-occlusion capability in an area.

Compared with the prior art, the invention has the following beneficial effects:

(1) the coordinate regression module has strong adaptability, and can be added after most of semantic segmentation models and modified according to the output structure of the backbone network;

(2) under the condition of existing vertex coordinates or masks, data do not need to be marked again, and supervision training is directly carried out after conversion;

(3) the coordinate regression module directly obtains all vertex coordinates of the certificate area through network calculation, is not limited by fixed shapes such as rectangles and the like, and can resist deformation interference of various images;

(4) the coordinate points have content independence, and can provide anti-blocking capability in the certificate image when being finely adjusted together with the backbone network, and correct errors of the segmentation network.

(5) The external rectangle is further calculated without a post-processing method, calculation can be accelerated through the GPU, the process complexity and tool dependence during application deployment are reduced, and the calculation speed is remarkably improved.

Description of the drawings:

FIG. 1 is a prior art schematic;

FIG. 2 is a schematic flow chart of the present invention;

FIG. 3 is a schematic diagram of a network architecture according to the present invention;

FIG. 4 is a schematic diagram of the regression method of the present invention.

The specific implementation mode is as follows:

the following detailed description of specific embodiments of the invention is provided, but it should be understood that the scope of the invention is not limited to the specific embodiments.

Throughout the specification and claims, unless explicitly stated otherwise, the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element or component but not the exclusion of any other element or component.

The invention provides a certificate image vertex coordinate regression method based on semantic segmentation, which is characterized in that a semantic segmentation database and an existing semantic segmentation model aiming at certificate images such as motor vehicle registration certificates, driving certificates and the like are constructed by means of data generated by users in app in the automobile industry of a vehicle 300, and the improved semantic segmentation method is adopted to obtain the vertex coordinates of the certificates in the images. The patent adopts a certificate image vertex coordinate regression method based on semantic segmentation, and specifically comprises the following steps:

1. an innovative coordinate regression module is used, various semantic segmentation networks are used as backbone networks, the polygon vertex coordinates of a certificate area are directly obtained, and post-processing operations such as external polygon solving are replaced. The coordinate regression module finally uses the maximum index in the thermodynamic diagram to obtain the coordinate of each vertex, an image post-processing algorithm is not needed to calculate the circumscribed rectangle, dependence on an image processing library is reduced during deployment, and high speed and high efficiency are achieved.

2. The overall network structure is shown in fig. 3 and fig. 4, wherein the semantic segmentation network features and the semantic segmentation results are given by a general semantic segmentation model, such as deep lab, UNet, PSPNet, and the like. The coordinate regression module comprises a feature map convolution module, a mask attention module, an attention fusion module, a thermodynamic diagram calculation module and the like and is arranged behind the semantic segmentation module.

3. The input of the feature map convolution module is a semantic segmentation network feature, namely a network layer used for calculating a final segmentation result in a semantic segmentation model, and the feature map convolution module comprises a plurality of 3x3 convolution layers, ReLu activation layers and Batch Normalization layers and obtains output with the same width and height as the input feature map and the channel number being the regression vertex coordinate number.

4. The input of the mask attention module is a semantic segmentation result, global pooling is carried out after a plurality of 3x3 convolution layers, ReLu activation layers and Batch Normalization layers, a middle feature layer with the width and height of 1 and the number of channels as the number of vertex-homing coordinates is obtained, and the output of the module is obtained after the middle feature layer passes through two full-connection layers.

5. And the attention fusion module performs pixel-by-pixel addition on the output characteristic graphs of the characteristic graph convolution module and the mask attention module, and outputs the result through a sigmoid activation layer.

6. The input of the thermodynamic diagram calculation module is the output of the attention fusion module, the width and the height of the thermodynamic diagram are the same as those of the semantic segmentation network feature diagram, the number of channels is the number of regression vertex coordinates, and each channel of the feature diagram represents a thermodynamic diagram of each vertex. And (3) directly performing loss calculation on the input and the label during training, and calculating the maximum value index of each channel through an argmax function during reasoning, namely each predicted vertex coordinate.

7. The labels of the coordinate regression module are vertex thermodynamic diagrams, and can be automatically generated from the vertexes of the minimum circumscribed polygon according to the existing semantic segmentation label diagram before training. The Smooth L1 is used as a loss function to calculate the loss of the label and network output thermodynamic diagrams during training. The coordinate regression module can be used as an improvement module to carry out independent training, and can also be used for carrying out weighted calculation together with a loss function of the original semantic segmentation model. The coordinate thermodynamic diagram has content independence and can provide anti-occlusion capability in an area.

8. To verify the effect of the coordinate regression module, a comparative test was performed using 198 motor vehicle registration certificate pictures containing 4 semantic segmentation classes, with the following results:

。

the foregoing descriptions of specific exemplary embodiments of the present invention have been presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain certain principles of the invention and its practical application to enable one skilled in the art to make and use various exemplary embodiments of the invention and various alternatives and modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.

Claims

1. A certificate image vertex coordinate regression system based on semantic segmentation is characterized in that: the system comprises a coordinate regression module which is arranged behind a semantic segmentation module.

2. The semantic segmentation based certificate image vertex coordinate regression system of claim 1, wherein:

the semantic segmentation module is used for performing semantic segmentation according to the backbone network output structure by adopting a semantic segmentation model;

3. The semantic segmentation based certificate image vertex coordinate regression system of claim 1 or 2, characterized in that: the semantic segmentation module adopts deep Lab, UNet, PSPNet or other semantic segmentation models.

4. The semantic segmentation based certificate image vertex coordinate regression system of claim 1 or 2, characterized in that: the coordinate regression module comprises a feature map convolution module, a mask attention module, an attention fusion module and a thermodynamic diagram calculation module;

5. A certificate image vertex coordinate regression identification method based on semantic segmentation is characterized by comprising the following steps: and the coordinate regression module is used for taking various semantic segmentation networks as backbone networks to directly obtain the polygon vertex coordinates of the certificate area, and finally, the coordinate regression module is used for solving the coordinates of each vertex by using the maximum value index in the thermodynamic diagram.

6. The regression recognition method for certificate image vertex coordinates based on semantic segmentation as claimed in claim 1, wherein the semantic segmentation network features and the semantic segmentation result are given by a common semantic segmentation model, and the coordinate regression module is arranged behind the semantic segmentation model.

7. The certificate image vertex coordinate regression recognition method based on semantic segmentation as claimed in claim 1, wherein the feature graph convolution module semantically segments network features to obtain output to the next process, wherein the width and the height of the input feature graph are the same, and the number of channels is the number of regression vertex coordinates;

the labels of the coordinate regression module are vertex thermodynamic diagrams, and can be automatically generated from the vertexes of the minimum circumscribed polygon according to the existing semantic segmentation label diagram before training; the Smooth L1 is used as a loss function to calculate the loss of the label and network output thermodynamic diagrams during training.