CN113822428A

CN113822428A - Neural network training method and device and image segmentation method

Info

Publication number: CN113822428A
Application number: CN202110905429.5A
Authority: CN
Inventors: 王春; 陈永录; 张飞燕
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2021-08-06
Filing date: 2021-08-06
Publication date: 2021-12-21

Abstract

The present disclosure provides a neural network training method and device based on context coding, and an image segmentation method, which can be applied in the field of artificial intelligence, finance or other fields. The training method includes: inputting the training data set and the verification data set into the feature encoding module to obtain the feature image, the training data set and the verification data set both include multiple sample images, the sample images include positive samples and negative samples, the training data set and the verification data set The dataset has ground-truth labels; the feature images are input to the context information extraction module to generate high-level feature maps; the high-level feature maps are input to the feature decoder module to generate predicted segmentation results of the sample images; The type determines the loss function of the neural network, calculates the loss value between the predicted segmentation result and the real label based on the loss function of the neural network, and updates the parameters of the neural network according to the loss value; outputs the neural network after training.

Description

Neural network training method and device and image segmentation method

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a neural network training method and apparatus based on context coding, an electronic device, a readable storage medium, and an image segmentation method.

Background

The neural network is more and more widely applied in various fields, and in the technical field of image segmentation, the neural network is used for segmenting images after needing to be trained. In the related art, when the image segmentation neural network training is performed, the positive and negative samples in the sample database are unbalanced in distribution, so that the positive and negative samples have a large influence on the result of the neural network training in the training process of the neural network, and finally, the result difference of the image segmentation is large, the boundary of a target area cannot be accurately described, and the result of the image segmentation cannot be truly reflected.

Disclosure of Invention

In view of the foregoing, the present disclosure provides a neural network training method, apparatus, electronic device, readable storage medium, and image segmentation method based on context coding.

According to a first aspect of the present disclosure, there is provided a method for training a neural network based on context coding, the neural network including a feature encoder module, a context information extraction module, and a feature decoder module, the method including: inputting a training data set and a verification data set to the feature coding module to obtain feature images, wherein the training data set and the verification data set respectively comprise a plurality of sample images, the sample images comprise positive samples and negative samples, and the training data set and the verification data set are provided with real labels; inputting the feature image into the context information extraction module, extracting context semantic information, and generating high-level feature mapping; inputting the advanced feature map into the feature decoder module to generate a prediction segmentation result of the sample image; determining a loss function of the neural network according to the types of the positive samples and the negative samples in the sample image, calculating a loss value between a prediction segmentation result and the real label based on the loss function of the neural network, and updating parameters of the neural network according to the loss value; and judging whether the iteration times of all the sample images reach a first threshold value or not, and outputting the trained neural network after the iteration times of all the sample images reach the first threshold value.

According to an embodiment of the present disclosure, the feature encoder module and the feature decoder module each include a multi-level multi-scale fusion module; the two adjacent stages of multi-scale fusion modules in the feature encoder module are connected through convolution operation with preset step length, and the convolution operation is used for feature adding operation between the two adjacent stages of multi-scale fusion modules; and the multi-scale fusion modules of two adjacent stages in the feature decoder module are connected through the transposition convolution operation of the preset step size, and the convolution operation is used for the feature adding operation between each other.

According to an embodiment of the present disclosure, each of the multi-level multi-scale fusion modules includes: the method comprises the following steps that n first convolution layers from 1 st to n-1 st are used for extracting first features of an input image and splicing the extracted first features with the first features extracted by the n first convolution layers to obtain first complex features with different scales, wherein n is a natural number; the second convolution layer is used for extracting a second feature of the input image and adding the extracted second feature and the first complex feature to obtain a second complex feature; and the activation function layer and the normalization layer are used for processing the second complex features and outputting the processing result.

According to an embodiment of the present disclosure, before inputting the training data set and the verification data set to the feature encoding module, the method further includes preprocessing the sample image, the preprocessing including: carrying out format conversion on the sample image to generate a target format image; zooming the target format image to generate a target size image; and carrying out normalization processing and binarization processing on the target size image.

According to the embodiment of the disclosure, after the sample image is preprocessed, the sample image is randomly divided into a training data set and a verification data set according to a set proportion, and the training data set is subjected to data enhancement; and the data enhancement comprises at least one of translation transformation, random cutting and contrast transformation of the image after the binarization processing.

According to an embodiment of the present disclosure, the determining the loss function of the neural network according to the types of the positive and negative samples in the sample image comprises: in a case where it is determined that the sample image input to the neural network is the positive sample, determining that the loss function is calculated by the following formula:

wherein G is a real label, and P is a prediction result.

According to an embodiment of the present disclosure, the determining the loss function of the neural network according to the types of the positive and negative examples in the example image further comprises: in a case where it is determined that the sample image input to the neural network is the negative sample, calculating the loss function by the following formula:

wherein G is a real label, P is a prediction result, and N represents the number of samples when the activation value corresponding to the negative sample in the training data set is greater than the second threshold.

According to an embodiment of the present disclosure, the calculating a loss value between the prediction segmentation result and the real label based on the loss function of the neural network, and updating the parameter of the neural network according to the loss value includes: calculating a first loss value between a prediction segmentation result and the real label in a training process by using the loss function obtained based on a training data set; calculating a second loss value between the prediction segmentation result and the real label in the verification process by using the loss function obtained based on the verification data set; updating a parameter of the neural network if the second loss value is less than the first loss value.

A second aspect of the present disclosure provides an image segmentation method, wherein a target image is input to a neural network, resulting in an image segmentation result; wherein the neural network is trained using the method according to the above.

A third aspect of the present disclosure provides a context coding-based neural network training apparatus, the neural network including a feature encoder module, a context information extraction module, and a feature decoder module, the apparatus including: the input module is configured to input a training data set and a verification data set to the feature coding module to obtain feature images, wherein the training data set and the verification data set respectively comprise a plurality of sample images, the sample images comprise positive samples and negative samples, and the training data set and the verification data set are provided with real labels; an extraction module configured to input the feature image to the context information extraction module, extract context semantic information, and generate a high-level feature map; a generation module configured to input the advanced feature map into the feature decoder module, generating a result of predictive segmentation of the sample image; and an updating module configured to determine a loss function of the neural network according to types of positive and negative samples in the sample image, calculate a loss value between a prediction segmentation result and the real label based on the loss function of the neural network, and update a parameter of the neural network according to the loss value; and the output module is configured to judge whether the iteration times of all the sample images reach a first threshold value, and output the trained neural network after the iteration times of all the sample images reach the first threshold value.

According to the embodiment of the disclosure, the neural network training device further comprises a preprocessing module, before inputting the training data set and the verification data set in the sample image to the feature coding module, the preprocessing module is configured to perform format conversion on the sample image to generate a target format image; zooming the target format image to generate a target size image; and carrying out normalization processing and binarization processing on the target size image.

According to the embodiment of the disclosure, the neural network training device further comprises a data enhancement module, after the sample image is preprocessed, the sample image is randomly divided into a training data set and a verification data set according to a set proportion, and the training data set is subjected to data enhancement; and the data enhancement module is configured to perform translation transformation and/or random cropping and/or contrast transformation on the image after binarization processing.

According to an embodiment of the present disclosure, the update module includes an update submodule configured to calculate a first loss value between a prediction segmentation result and the true label in a training process using the loss function obtained based on a training data set; calculating a second loss value between the prediction segmentation result and the real label in the verification process by using the loss function obtained based on the verification data set; updating a parameter of the neural network if the second loss value is less than the first loss value.

A fourth aspect of the present disclosure provides an electronic device, comprising: one or more processors; a storage device for storing executable instructions that, when executed by the processor, implement a neural network training method in accordance with the foregoing.

A fifth aspect of the present disclosure provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, implement a neural network training method in accordance with the foregoing.

A sixth aspect of the present disclosure provides a computer program product comprising a computer program which, when executed by a processor, implements a neural network training method in accordance with the above.

According to the embodiment of the disclosure, the neural network with the context information extraction module is constructed, and in the neural network training process, the loss function is determined based on the types of the positive samples and the negative samples in the sample data set, so that the neural network is trained by adopting different loss functions aiming at different sample types, the condition that the gradient of the neural network disappears aiming at different numbers of the positive samples and the negative samples can be effectively avoided, and the accuracy of network parameters in the neural network training process is effectively improved.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which proceeds with reference to the accompanying drawings, in which:

fig. 1 schematically shows a schematic diagram of a system architecture to which the neural network training method of the embodiments of the present disclosure may be applied;

FIG. 2 schematically illustrates a flow diagram of a neural network training method in accordance with an embodiment of the present disclosure;

FIG. 3 schematically illustrates a structural diagram of a multi-scale fusion module of a neural network according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow diagram for preprocessing a sample image for a neural network, in accordance with an embodiment of the present disclosure;

FIG. 5 schematically illustrates a structural diagram of a dilation convolution module of a neural network in accordance with an embodiment of the present disclosure;

FIG. 6 schematically illustrates a structural schematic of a multi-scale pooling module of a neural network according to an embodiment of the present disclosure;

FIG. 7 schematically illustrates a structural diagram of a context coding based neural network according to an embodiment of the present disclosure;

fig. 8 schematically shows a block diagram of a training apparatus for a context coding based neural network according to an embodiment of the present disclosure; and

fig. 9 schematically shows a block diagram of an electronic device adapted for a neural network training method according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

With the rapid development of the internet financial industry, the requirements on the information security of users are continuously improved. In order to ensure the security of user information, more and more security technologies are applied to the field of information security technologies, such as face recognition technology. The face information has the characteristics of non-copying, simplicity, convenience, intuition and non-theft, and is the key point of the current safety technology application. However, in the process of identifying and checking the face information, the acquired face information needs to be processed, for example, image segmentation is performed on the face image, so as to achieve accurate comparison of the face information.

In the related technology, researchers apply a U-Net neural network model to a face segmentation scene, the model adopts an encoder-decoder structure and is also divided into a down-sampling stage and an up-sampling stage, the network structure only comprises a convolution layer and a pooling layer and does not comprise a full connection layer, a shallow high-resolution layer in the network is used for solving the problem of pixel positioning, and a deep layer is used for solving the problem of pixel classification, so that segmentation of image semantic levels can be realized. However, continuous merging operation or striding convolution operation in the encoder will cause some spatial information to be lost, and although such spatial invariance is beneficial in classification tasks and object detection tasks, it will often hinder intensive prediction tasks that require detailed spatial information, and limit the learning ability of the model for shape change, so that when the difference between segmented objects is large (for example, the angle of a human face is different, the skin color of each organ is different, and the shape is different), the boundary of the object region cannot be accurately drawn. In addition, the face segmentation task has the condition that the distribution of positive and negative samples is unbalanced, and the classification difficulty is different even for similar samples.

In view of this, an embodiment of the present disclosure provides a neural network training method and apparatus based on context coding, and an image segmentation method, where the neural network includes a feature encoder module, a context information extraction module, and a feature decoder module, and the neural network training method includes: inputting a training data set and a verification data set to a feature coding module to obtain feature images, wherein the training data set and the verification data set both comprise a plurality of sample images, the sample images comprise positive samples and negative samples, and the training data set and the verification data set are provided with real labels; inputting the feature image into a context information extraction module, extracting context semantic information, and generating high-level feature mapping; inputting the high-level feature mapping into a feature decoder module to generate a prediction segmentation result of the sample image; determining a loss function of the neural network according to the types of the positive samples and the negative samples in the sample image, calculating a loss value between the prediction segmentation result and the real label based on the loss function of the neural network, and updating parameters of the neural network according to the loss value; and judging whether the iteration times of all the sample images reach a first threshold value or not, and outputting the trained neural network after the iteration times of all the sample images reach the first threshold value.

Fig. 1 schematically shows a schematic diagram of a system architecture to which the neural network training method of the embodiments of the present disclosure may be applied. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios. It should be noted that the neural network training method and apparatus based on context coding provided by the embodiment of the present disclosure may be used in the related aspects of image processing in the technical field of artificial intelligence, the technical field of image processing, and the financial field, and may also be used in any field other than the financial field.

As shown in fig. 1, an exemplary system architecture 100 to which the neural network training method may be applied may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various image acquisition client applications, such as a photo application, a face image, etc. (for example only) can be installed on the

terminal devices

101, 102, 103.

The

terminal devices

101, 102, 103 may be various electronic devices having display screens and supporting face information acquisition, including but not limited to smart phones, smart televisions, tablet computers, laptop portable computers, desktop computers, and the like. The

terminal devices

101, 102, and 103 may acquire face image information of the user or other people through face information acquisition, and transmit the face image information to the server 105 through the network 104.

The server 105 may be a server that provides various services, such as a background management server (for example only) that processes or stores pictures sent by users using the

terminal devices

101, 102, 103. The background management server may perform processing such as analysis on the received user data (e.g., a character image), and feed back a processing result (e.g., a segmentation or recognition result after image processing) to the terminal device.

It should be noted that the training method of the neural network provided by the embodiment of the present disclosure may be generally performed by the

terminal devices

101, 102, 103 or the server 105. Accordingly, the neural network training device provided by the embodiment of the present disclosure may be generally disposed in the

terminal device

101, 102, 103 or the server 105. The neural network training method provided by the embodiment of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105. Accordingly, the neural network training device provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

The context coding-based neural network training method of the disclosed embodiments will be described in detail below with reference to fig. 2 to 7.

Fig. 2 schematically shows a flow diagram of a neural network training method according to an embodiment of the present disclosure. According to an embodiment of the present disclosure, the neural network includes a feature encoder module, a context information extraction module, and a feature decoder module.

As shown in fig. 2, the neural network training method 200 of the embodiment of the present disclosure includes operations S210 to S250.

In operation S210, a training data set and a verification data set are input to the feature coding module to obtain feature images, where the training data set and the verification data set both include a plurality of sample images, the sample images include positive samples and negative samples, and the training data set and the verification data set have real labels.

For example, a plurality of images to be segmented in a data set are processed to obtain a training data set and a verification data set, wherein the training data set is used for training a neural network, and the verification data set is used for verifying the trained neural network. The training data set and the verification data set have real labels, and the training result of the neural network can be judged according to the real labels, and the operation such as adjustment of parameters of the neural network can be carried out. In an embodiment of the present disclosure, positive and negative examples are included in the sample image. For example, the number of positive and negative examples in the sample image differs depending on the data set selected at each time.

In operation S220, the feature image is input to the context information extraction module, context semantic information is extracted, and a high-level feature map is generated.

For example, the context information extraction module is used to extract context semantic information in a picture to generate an advanced feature map.

In operation S230, the advanced feature map is input to the feature decoder module, and a result of predictive segmentation of the sample image is generated.

In operation S240, a loss function of the neural network is determined according to the types of the positive and negative samples in the sample image, a loss value between the prediction segmentation result and the real label is calculated based on the loss function of the neural network, and a parameter of the neural network is updated according to the loss value.

For example, the sample image includes positive samples and negative samples, for the positive samples, the loss function may be a dice loss function, and a loss value between the prediction segmentation result and the true label may be calculated. For the negative sample, if a dice loss function is selected, the dice coefficient of the negative sample is equal to 0, the loss value is equal to 1, the gradient is 0, the neural network model is not affected in the training process, and the target area cannot correctly classify each pixel point, so that the background area and the target area tend to be predicted as the background area. Therefore, by re-determining a new loss function different from the dice loss function based on the negative examples when the examples input to the neural network are negative examples. The method can effectively avoid the situation that the gradient is 0, improve the recognition accuracy of the background area and the target area in the neural network training process, and improve the calculation precision of the neural network training.

In an embodiment of the present disclosure, the feature encoder module and the feature decoder module each comprise a multi-level, multi-scale fusion module; the two adjacent stages of multi-scale fusion modules in the feature encoder module are connected through convolution operation with preset step length, and the convolution operation is used for feature adding operation between the two adjacent stages of multi-scale fusion modules; the multi-scale fusion modules of two adjacent stages in the feature decoder module are connected through a transposition convolution operation with a preset step size, and the convolution operation is used for feature adding operation among the feature decoder modules.

For example, the feature encoder module includes 4 multi-scale fusion modules, and a 3 × 3 convolution operation with a step size of 2 is performed between every two adjacent multi-scale fusion modules. The feature decoder module comprises 4 multi-scale fusion modules, and 3 × 3 transposition convolution operation with step size of 2 is performed between every two adjacent multi-scale fusion modules. The multi-scale fusion modules in the feature encoder module and the multi-scale fusion modules in the feature decoder module are in one-to-one correspondence, and feature adding operation is carried out on features among the multi-scale fusion modules.

Fig. 3 schematically shows a structural diagram of a multi-scale fusion module 300 of a neural network according to an embodiment of the present disclosure.

As shown in fig. 3, each stage of the multi-scale fusion module 300 includes: the method comprises the following steps that n first convolution layers from 1 st to n-1 st are used for extracting first features of an input image and splicing the extracted first features with the first features extracted by the n first convolution layers to obtain first complex features with different scales, wherein n is a natural number; the second convolution layer is used for extracting a second feature of the input image and adding the extracted second feature and the first complex feature to obtain a second complex feature; and the activation function layer and the normalization layer are used for processing the second complex features and outputting the processing result.

For example, the multi-scale fusion module includes 3 first convolution layers 302, 303, 304, where the first convolution layers may be a 3 × 3 convolution operation. The first convolution operation 302 and the second convolution operation 303 are respectively used for extracting a first feature of the input 301 image, and residual connection is added after the first convolution operation and the second convolution operation, that is, the first feature extracted by the first convolution operation 302 and the second convolution operation 303 is spliced with a feature obtained after the third convolution operation 304, so as to obtain first complex features with different scales. In order to keep the number of Input and output channels the same, we set the number of filters of three consecutive convolutional layers to (0.2, 0.3, 0.5) × Input, respectively_channelWherein Input_channelIs the number of input channels.

For example, the second convolution layer 305 may be, for example, a 1 × 1 convolution operation for extracting the second feature of the input image 301, adding 1 × 1 convolution operation after the input, and adding the first complex feature obtained previously to generate the second complex feature.

For example, the activation function 306 of the present disclosure may be a relu activation function, and the Normalization process 307 may be, for example, Batch Normalization, and the output result is used as an input of the next convolution or transposed convolution after the Normalization process is performed, and is finally output through the output 308.

In operation S250, it is determined whether the number of iterations of all sample images reaches a first threshold, and the trained neural network is output after the first threshold is reached.

For example, the first threshold includes an epoch value, whether the number of iterations of all sample images in the data set reaches a preset epoch value is determined, and the training degree of the neural network can be determined according to the preset epoch value. The Epoch value may be, for example, a value between 100 and 200 times, for example, the Epoch value is set to 150 times, and when the Epoch value exceeds 150 times, the training is stopped, and the trained neural network is output. For another example, when the epoch value is determined not to reach the first threshold, the input sample image is continuously input for iteration until the epoch value reaches the first threshold, the training process is ended, and the trained neural network is output.

Fig. 4 schematically illustrates a flow diagram of preprocessing a sample image of a neural network according to an embodiment of the present disclosure.

In an embodiment of the present disclosure, before inputting the training data set and the verification data set in the sample image to the feature encoding module, the method further includes preprocessing the sample image, where the preprocessing includes: carrying out format conversion on the sample image to generate a target format image; carrying out image scaling on the target format image to generate a target size image; and carrying out normalization processing and binarization processing on the target size image.

As shown in fig. 4, the flow 400 of preprocessing the sample image includes operations S410 to S430.

In operation S410, the sample image is format-converted, generating a target format image.

For example, when sample image collection is performed, formats of sample images are different, and the sample images input to the neural network need to satisfy a specific format, the sample images are subjected to format conversion as needed to generate target format images that satisfy the requirements.

In operation S420, the target format image is image-scaled, generating a target size image.

For example, the target format image is scaled to generate an image that meets the required size. For example, pixel values of an image are set to 256 × 256.

In operation S430, the target size image is subjected to normalization processing and binarization processing.

For example, the scaled target size image is normalized to the [0, 1] section. And then carrying out binarization processing on the normalized image. For example, a threshold value of 0.5 is set, pixel values larger than the threshold value are processed to be 1, and pixel values smaller than the threshold value are processed to be 0.

According to the embodiment of the disclosure, after the sample image is preprocessed, the method further comprises the steps of randomly dividing the sample image into a training data set and a verification data set according to a set proportion, and performing data enhancement on the training data set. The data enhancement comprises at least one of translation transformation, random cutting and contrast transformation of the image after the binarization processing.

For example, the sample image is randomly divided into a training data set and a validation data set at a set scale. For example, randomly divided according to a ratio of 7: 3, one part is a training set, and the other part is a verification set.

In the embodiment of the disclosure, data enhancement is performed through the training data set, so that the training data volume can be increased, and the neural network obtained through training has higher accuracy.

For example, the data enhancement may be a shift transform (shift), i.e. a shift of the original picture in some way (the step size, range and direction of the shift being determined in a predefined or random manner) within the image plane.

The data enhancement may also be Random Crop (Random Crop), i.e. randomly defining the region of interest to Crop the image, corresponding to adding Random perturbations.

The data enhancement may also be a Contrast transform (Contrast), i.e. changing the image Contrast, which is equivalent to keeping the hue component H constant and changing the brightness component V and saturation S in HSV space, for simulating the illumination change of the real environment.

In an embodiment of the present disclosure, determining the loss function of the neural network according to the types of the positive and negative examples in the sample image includes: in the case where the sample image of the input neural network is determined to be a positive sample, determining the loss function is calculated by the following formula:

wherein G is a real label, and P is a prediction result.

For example, in the case of determining that the sample image input to the neural network is a positive sample, the loss function is an existing calculation formula of the loss function.

In an embodiment of the present disclosure, determining the loss function of the neural network according to the types of the positive and negative examples in the example image further comprises: in the case where it is determined that the sample image input to the neural network is a negative sample, taking the L1 norm of the element in the neural network model that outputs an output greater than the second threshold as a new loss function, the loss function is calculated by the following formula:

For example, the second threshold value may range between 0 and 1, for example preferably 0.001,

according to the embodiment of the disclosure, different loss functions are determined by determining different types of samples input into the neural network, so that the situation that the gradient is 0 when the positive and negative samples are inconsistent can be avoided, the recognition accuracy of the background area and the target area in the neural network training process is improved, and the calculation accuracy of the neural network training is improved.

In an embodiment of the present disclosure, calculating a loss value between the prediction segmentation result and the real label based on a loss function of the neural network, and updating a parameter of the neural network according to the loss value includes: calculating a first loss value between a prediction segmentation result and a real label in a training process by using a loss function obtained based on a training data set; calculating a second loss value between the prediction segmentation result and the real label in the verification process by using a loss function obtained based on the verification data set; updating a parameter of the neural network if the second loss value is less than the first loss value.

For example, the first loss value is a loss value calculated during training, and the second loss value is a loss value calculated during validation. If the second loss value is greater than the first loss value, the parameters of the neural network in the training process of the neural network reach the expected result. Operation S250 of the disclosed embodiment is performed. If the second loss value is smaller than the first loss value, it indicates that the parameters of the neural network during the verification process meet the requirements, the currently trained neural network model is saved to update the network parameters, and then operation S250 of the embodiment of the present disclosure is performed.

In the embodiment of the disclosure, the context information extraction module of the neural network is composed of a dense expansion convolution module and a residual multi-scale pooling module.

Fig. 5 schematically shows a structural diagram of a dilation convolution module of a neural network according to an embodiment of the present disclosure.

As shown in fig. 5, the dense dilation-convolution modules are stacked in a cascading manner, and there are four cascading branches, wherein the sampling rate of dilation-convolution increases from 1 to 1, 3, and 5, and the receptive fields corresponding to each branch are 3, 7, 9, and 19, respectively, so as to capture feature information from multiple scales. The linear correction is performed through a 1 x 1 convolutional layer at the end of each convolutional branch of dilation. And finally, adding the original characteristics and the output characteristics of other parallel branches position by using a shortcut mechanism in ResNet.

Fig. 6 schematically illustrates a structural schematic diagram of a multi-scale pooling module of a neural network according to an embodiment of the present disclosure.

As shown in fig. 6, the multi-scale pooling module encodes global context information using four fields of view of different sizes, 2 × 2, 3 × 3, 5 × 5 and 6 × 6, respectively, with the four levels of output containing various sizes of feature maps. Furthermore, to reduce the dimensionality and computational cost of the weights, a 1 × 1 convolution is used after each pooling level to reduce the number of channels for the corresponding level to 1/M of the original input, where M represents the number of channels in the original input. And then, the low-dimensional feature map output by each level is up-sampled to the size of the original input feature map by a bilinear interpolation method. Finally, the original features are spliced with up-sampled feature maps of several different levels.

Fig. 7 schematically shows a structural diagram of a context coding-based neural network according to an embodiment of the present disclosure.

As shown in fig. 7, the neural network includes a feature encoder module, a context information extraction module, and a feature decoder module. The feature encoder module and the feature decoder module respectively comprise 4 multi-scale fusion modules, the multi-scale fusion modules of the feature encoder module and the multi-scale fusion modules of the feature decoder module are in one-to-one correspondence, and features are added mutually. Wherein each multi-scale fusion module in the feature encoder modules is connected by a convolution of 3 x 3 with a step size of 2. Each multi-scale fusion module in the feature decoder is connected by a 3 x 3 transposed convolution of step size 2.

In the embodiment of the present disclosure, in the training process of the neural network, a plurality of sample images need to be preprocessed, for example, format conversion, image scaling, normalization, and binarization as described above. And then, randomly dividing the sample image into a training data set and a verification data set according to a set proportion, and performing data enhancement on the training data set. Next, the neural network is trained using the training data set, followed by validating the neural network using the validation data set. For example, the convolution kernel weight and loss function of the neural network are initialized to 0 first; inputting a training data set into a context-coding based neural network; calculating the training data set and parameters of each node in the neural network based on the context coding to realize forward propagation of network training; calculating the difference between the prediction segmentation result output by the forward propagation of the neural network and the real label, and calculating a loss function; for positive samples, the loss function is calculated using equation (1) above, and for negative samples, the loss function is calculated using equation (2) above. Next, the verification data set is input to a neural network for verification. In the verification process, based on the same method for determining the loss function according to the types of the positive and negative samples, the loss value between the prediction segmentation result and the real label in the verification process is calculated. And judging whether the loss value in the verification process is smaller than the minimum loss value in the previous verification process, and if the loss value in the verification process is smaller than the minimum loss value in the previous verification process, saving the currently trained neural network parameters. And then, judging whether the current iteration number reaches a preset epoch value, if not, continuing to perform next iteration, and if so, finishing the training of the surface neural network and outputting the trained neural network.

In the embodiment of the present disclosure, there is also provided an image segmentation method, including inputting a target image into a neural network, obtaining an image segmentation result; wherein the neural network is trained using the training method described above.

According to the embodiment of the disclosure, the trained neural network is used for image segmentation, for example, image segmentation in a face recognition technology, because the face angles are different or the backgrounds are different, the obtained face images have large difference, and the distribution of positive and negative samples is unbalanced in the face image segmentation process, so that the neural network trained by the training method is used for segmenting the face images, and the accuracy of image segmentation can be improved.

Fig. 8 schematically shows a block diagram of a training apparatus for a neural network based on context coding according to an embodiment of the present disclosure.

As shown in fig. 8, the neural network training device 800 of the embodiment of the present disclosure includes an input module 810, an extraction module 820, a generation module 830, an update module 840, and an output module 850.

The input module 810 is configured to input a training data set and a verification data set to the feature coding module to obtain a feature image, where the training data set and the verification data set both include a plurality of sample images, the sample images include positive samples and negative samples, and the training data set and the verification data set have real labels. In an embodiment, the input module 810 may be configured to perform the operation S210 described above, which is not described herein again.

The extraction module 820 is configured to input the feature images to a contextual information extraction module, extract contextual semantic information, and generate advanced feature maps. In an embodiment, the extracting module 820 may be configured to perform the operation S220 described above, which is not described herein again.

The generation module 830 is configured to input the advanced feature map to the feature decoder module, generating a result of predictive segmentation of the sample image. In an embodiment, the generating module 830 may be configured to perform the operation S230 described above, and is not described herein again.

The update module 840 is configured to determine a loss function of the neural network according to the types of the positive and negative samples in the sample image, calculate a loss value between the prediction segmentation result and the real label based on the loss function of the neural network, and update a parameter of the neural network according to the loss value. In an embodiment, the update module 840 may be configured to perform the operation S240 described above, which is not described herein again.

The output module 850 is configured to determine whether the number of iterations of all sample images reaches a first threshold, and output the trained neural network after reaching the first threshold. In an embodiment, the output module 850 may be configured to perform the operation S250 described above, which is not described herein again.

In an embodiment of the present disclosure, the neural network training apparatus further includes a preprocessing module, before inputting the training data set and the verification data set in the sample image to the feature coding module, further includes preprocessing the sample image, and the preprocessing module is configured to perform format conversion on the sample image to generate a target format image; carrying out image scaling on the target format image to generate a target size image; and carrying out normalization processing and binarization processing on the target size image.

In the embodiment of the disclosure, the neural network training device further comprises a data enhancement module, after preprocessing the sample image, randomly dividing the sample image into a training data set and a verification data set according to a set proportion, and performing data enhancement on the training data set; and the data enhancement module is configured to perform translation transformation and/or random cropping and/or contrast transformation on the image after binarization processing.

In an embodiment of the disclosure, the update module includes an update submodule configured to calculate a first loss value between a prediction segmentation result and a true label in a training process using a loss function obtained based on a training data set; calculating a second loss value between the prediction segmentation result and the real label in the verification process by using a loss function obtained based on the verification data set; in the case that the second loss value is smaller than the first loss value, the parameters of the neural network are updated.

According to the embodiment of the present disclosure, any plurality of the input module 810, the extraction module 820, the generation module 830, the update module 840, the output module 850, the preprocessing module, the data enhancement module, and the update sub-module may be combined into one module to be implemented, or any one of the modules may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the input module 810, the extraction module 820, the generation module 830, the update module 840, the output module 850, the preprocessing module, the data enhancement module, and the update sub-module may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in any one of three implementations of software, hardware, and firmware, or in any suitable combination of any of them. Alternatively, at least one of the input module 810, the extraction module 820, the generation module 830, the update module 840, the output module 850, the preprocessing module, the data enhancement module, and the update sub-module may be at least partially implemented as a computer program module that, when executed, may perform a corresponding function.

Fig. 9 schematically illustrates a block diagram of an electronic device adapted to implement a neural network training method in accordance with an embodiment of the present disclosure. The electronic device shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 9, an electronic apparatus 900 according to an embodiment of the present disclosure includes a processor 901 which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage portion 908 into a Random Access Memory (RAM) 903. Processor 901 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 901 may also include on-board memory for caching purposes. The processor 901 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.

In the RAM 903, various programs and data necessary for the operation of the electronic apparatus 900 are stored. The processor 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904. The processor 901 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 902 and/or the RAM 903. Note that the programs may also be stored in one or more memories other than the ROM 902 and the RAM 903. The processor 901 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

Electronic device 900 may also include input/output (I/O) interface 905, input/output (I/O) interface 905 also connected to bus 904, according to an embodiment of the present disclosure. The electronic device 900 may also include one or more of the following components connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.

The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer readable storage medium carries one or more programs which, when executed, implement a neural network training method in accordance with an embodiment of the present disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 902 and/or the RAM 903 described above and/or one or more memories other than the ROM 902 and the RAM 903.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flow chart. When the computer program product runs in a computer system, the program code is used for causing the computer system to realize the neural network training method provided by the embodiment of the disclosure.

The computer program performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure when executed by the processor 901. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed in the form of a signal on a network medium, and downloaded and installed through the communication section 909 and/or installed from the removable medium 911. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The computer program, when executed by the processor 901, performs the above-described functions defined in the system of the embodiment of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. a neural network training method based on context coding, the neural network comprises a feature encoder module, a context information extraction module and a feature decoder module, the method comprises:

Input the training data set and the verification data set into the feature encoding module to obtain a feature image, the training data set and the verification data set both include a plurality of sample images, and the sample images include positive samples and negative samples, and the training the dataset and the validation dataset have ground truth labels;

Inputting the feature image to the context information extraction module, extracting contextual semantic information, and generating a high-level feature map;

inputting the high-level feature map into the feature decoder module to generate a predicted segmentation result of the sample image;

The loss function of the neural network is determined according to the types of positive samples and negative samples in the sample image, the loss value between the predicted segmentation result and the real label is calculated based on the loss function of the neural network, and the loss value between the predicted segmentation result and the real label is calculated according to the the loss value updates the parameters of the neural network; and

It is judged whether the number of iterations of all sample images reaches a first threshold, and after reaching the first threshold, the neural network after training is output.

2. The neural network training method according to claim 1, wherein the feature encoder module and the feature decoder module both comprise multi-level and multi-scale fusion modules;

In the feature encoder module, the adjacent two-level multi-scale fusion modules are connected by a convolution operation of a predetermined step size, and the convolution operation is used for mutual feature addition operations;

The multi-scale fusion modules of two adjacent stages in the feature decoder module are connected by the transposed convolution operation of the predetermined step size, and the convolution operation is used for the mutual feature addition operation.

3. The neural network training method according to claim 2, wherein each level of the multi-scale fusion module in the multi-level multi-scale fusion module comprises:

n first convolutional layers, the 1st to n-1th first convolutional layers are used to extract the first features of the input image, and the extracted first features are The first feature is spliced to obtain the first complex feature from different scales, and n is a natural number;

a second convolutional layer, the second convolutional layer is used to extract the second feature of the input image, and add the extracted second feature to the first complex feature to generate a second complex feature;

The activation function layer and the normalization layer are used for processing the second complex feature and outputting the processing result.

4. The neural network training method according to claim 1, wherein before inputting the training data set and the verification data set into the feature encoding module, it further comprises preprocessing the sample images, the preprocessing comprising :

Perform format conversion on the sample image to generate a target format image;

performing image scaling on the target format image to generate a target size image;

Normalize and binarize the target size image.

5. The neural network training method according to claim 4, wherein after preprocessing the sample image, it further comprises randomly dividing the sample image into a training data set and a verification data set according to a set ratio, and performing data augmentation on the training data set;

Wherein, the data enhancement includes performing at least one of translation transformation, random cropping, and contrast transformation on the binarized image.

6. The neural network training method according to claim 1, wherein the determining the loss function of the neural network according to the types of positive samples and negative samples in the sample image comprises:

When it is determined that the sample image input to the neural network is the positive sample, it is determined that the loss function is calculated by the following formula:

Among them, G is the real label, and P is the prediction result.

7. The neural network training method according to claim 6, wherein the determining the loss function of the neural network according to the types of positive samples and negative samples in the sample image further comprises:

When it is determined that the sample image input to the neural network is the negative sample, the loss function is calculated by the following formula:

Among them, G is the real label, P is the prediction result, and N is the number of samples whose activation value corresponding to the negative sample in the training data set is greater than the second threshold.

8. The neural network training method according to claim 7, wherein the loss value between the predicted segmentation result and the true label is calculated based on the loss function of the neural network, and the loss value is updated according to the loss value. The parameters of the neural network include:

Using the loss function obtained based on the training data set, calculate the first loss value between the predicted segmentation result in the training process and the true label;

Using the loss function obtained based on the verification data set, calculate the second loss value between the predicted segmentation result in the verification process and the true label;

When the second loss value is smaller than the first loss value, the parameters of the neural network are updated.

9. An image segmentation method, wherein,

Input the target image into the neural network to obtain the image segmentation result;

wherein the neural network is trained using the method of any one of claims 1 to 8.

10. A contextual coding-based neural network training device, the neural network comprising a feature encoder module, a context information extraction module and a feature decoder module, the device comprising:

The input module is configured to input the training data set and the verification data set into the feature encoding module to obtain the feature image, the training data set and the verification data set each include a plurality of sample images, and the sample images include positive samples and negative samples samples, the training data set and the validation data set have real labels;

an extraction module, configured to input the feature image into the context information extraction module, extract context semantic information, and generate an advanced feature map;

a generation module configured to input the high-level feature map into the feature decoder module to generate a predicted segmentation result of the sample image; and

an update module, configured to determine the loss function of the neural network according to the types of positive samples and negative samples in the sample image, and calculate the loss value between the predicted segmentation result and the true label based on the loss function of the neural network , and update the parameters of the neural network according to the loss value;

The output module is configured to judge whether the number of iterations of all the sample images reaches a first threshold, and output the neural network after the training is completed after reaching the first threshold.

11. An electronic device comprising:

one or more processors;

A storage device for storing executable instructions, which, when executed by the processor, implement the neural network training method according to any one of claims 1 to 8.

12. A computer-readable storage medium having executable instructions stored thereon, the instructions, when executed by a processor, implement the neural network training method according to any one of claims 1 to 8.

13. A computer program product comprising a computer program which, when executed, implements the neural network training method according to any one of claims 1 to 8.