+

CN113822428A - Neural network training method and device and image segmentation method - Google Patents

Neural network training method and device and image segmentation method Download PDF

Info

Publication number
CN113822428A
CN113822428A CN202110905429.5A CN202110905429A CN113822428A CN 113822428 A CN113822428 A CN 113822428A CN 202110905429 A CN202110905429 A CN 202110905429A CN 113822428 A CN113822428 A CN 113822428A
Authority
CN
China
Prior art keywords
neural network
feature
data set
image
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110905429.5A
Other languages
Chinese (zh)
Inventor
王春
陈永录
张飞燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202110905429.5A priority Critical patent/CN113822428A/en
Publication of CN113822428A publication Critical patent/CN113822428A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

本公开提供了一种基于上下文编码的神经网络训练方法及装置、图像分割方法,可以应用于人工智能领域、金融领域或其他领域。训练方法包括:将训练数据集和验证数据集输入至特征编码模块,得到特征图像,训练数据集和验证数据集均包括多个样本图像,样本图像包括正样本和负样本,训练数据集和验证数据集具有真实标签;将特征图像输入至上下文信息提取模块生成高级特征映射;将高级特征映射输入特征解码器模块,生成样本图像的预测分割结果;以及根据样本图像中的正样本和负样本的类型确定神经网络的损失函数,基于神经网络的损失函数计算预测分割结果和真实标签之间的损失值,并根据损失值更新神经网络的参数;输出完成训练后的神经网络。

Figure 202110905429

The present disclosure provides a neural network training method and device based on context coding, and an image segmentation method, which can be applied in the field of artificial intelligence, finance or other fields. The training method includes: inputting the training data set and the verification data set into the feature encoding module to obtain the feature image, the training data set and the verification data set both include multiple sample images, the sample images include positive samples and negative samples, the training data set and the verification data set The dataset has ground-truth labels; the feature images are input to the context information extraction module to generate high-level feature maps; the high-level feature maps are input to the feature decoder module to generate predicted segmentation results of the sample images; The type determines the loss function of the neural network, calculates the loss value between the predicted segmentation result and the real label based on the loss function of the neural network, and updates the parameters of the neural network according to the loss value; outputs the neural network after training.

Figure 202110905429

Description

Neural network training method and device and image segmentation method
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a neural network training method and apparatus based on context coding, an electronic device, a readable storage medium, and an image segmentation method.
Background
The neural network is more and more widely applied in various fields, and in the technical field of image segmentation, the neural network is used for segmenting images after needing to be trained. In the related art, when the image segmentation neural network training is performed, the positive and negative samples in the sample database are unbalanced in distribution, so that the positive and negative samples have a large influence on the result of the neural network training in the training process of the neural network, and finally, the result difference of the image segmentation is large, the boundary of a target area cannot be accurately described, and the result of the image segmentation cannot be truly reflected.
Disclosure of Invention
In view of the foregoing, the present disclosure provides a neural network training method, apparatus, electronic device, readable storage medium, and image segmentation method based on context coding.
According to a first aspect of the present disclosure, there is provided a method for training a neural network based on context coding, the neural network including a feature encoder module, a context information extraction module, and a feature decoder module, the method including: inputting a training data set and a verification data set to the feature coding module to obtain feature images, wherein the training data set and the verification data set respectively comprise a plurality of sample images, the sample images comprise positive samples and negative samples, and the training data set and the verification data set are provided with real labels; inputting the feature image into the context information extraction module, extracting context semantic information, and generating high-level feature mapping; inputting the advanced feature map into the feature decoder module to generate a prediction segmentation result of the sample image; determining a loss function of the neural network according to the types of the positive samples and the negative samples in the sample image, calculating a loss value between a prediction segmentation result and the real label based on the loss function of the neural network, and updating parameters of the neural network according to the loss value; and judging whether the iteration times of all the sample images reach a first threshold value or not, and outputting the trained neural network after the iteration times of all the sample images reach the first threshold value.
According to an embodiment of the present disclosure, the feature encoder module and the feature decoder module each include a multi-level multi-scale fusion module; the two adjacent stages of multi-scale fusion modules in the feature encoder module are connected through convolution operation with preset step length, and the convolution operation is used for feature adding operation between the two adjacent stages of multi-scale fusion modules; and the multi-scale fusion modules of two adjacent stages in the feature decoder module are connected through the transposition convolution operation of the preset step size, and the convolution operation is used for the feature adding operation between each other.
According to an embodiment of the present disclosure, each of the multi-level multi-scale fusion modules includes: the method comprises the following steps that n first convolution layers from 1 st to n-1 st are used for extracting first features of an input image and splicing the extracted first features with the first features extracted by the n first convolution layers to obtain first complex features with different scales, wherein n is a natural number; the second convolution layer is used for extracting a second feature of the input image and adding the extracted second feature and the first complex feature to obtain a second complex feature; and the activation function layer and the normalization layer are used for processing the second complex features and outputting the processing result.
According to an embodiment of the present disclosure, before inputting the training data set and the verification data set to the feature encoding module, the method further includes preprocessing the sample image, the preprocessing including: carrying out format conversion on the sample image to generate a target format image; zooming the target format image to generate a target size image; and carrying out normalization processing and binarization processing on the target size image.
According to the embodiment of the disclosure, after the sample image is preprocessed, the sample image is randomly divided into a training data set and a verification data set according to a set proportion, and the training data set is subjected to data enhancement; and the data enhancement comprises at least one of translation transformation, random cutting and contrast transformation of the image after the binarization processing.
According to an embodiment of the present disclosure, the determining the loss function of the neural network according to the types of the positive and negative samples in the sample image comprises: in a case where it is determined that the sample image input to the neural network is the positive sample, determining that the loss function is calculated by the following formula:
Figure BDA0003199644080000031
wherein G is a real label, and P is a prediction result.
According to an embodiment of the present disclosure, the determining the loss function of the neural network according to the types of the positive and negative examples in the example image further comprises: in a case where it is determined that the sample image input to the neural network is the negative sample, calculating the loss function by the following formula:
Figure BDA0003199644080000032
wherein G is a real label, P is a prediction result, and N represents the number of samples when the activation value corresponding to the negative sample in the training data set is greater than the second threshold.
According to an embodiment of the present disclosure, the calculating a loss value between the prediction segmentation result and the real label based on the loss function of the neural network, and updating the parameter of the neural network according to the loss value includes: calculating a first loss value between a prediction segmentation result and the real label in a training process by using the loss function obtained based on a training data set; calculating a second loss value between the prediction segmentation result and the real label in the verification process by using the loss function obtained based on the verification data set; updating a parameter of the neural network if the second loss value is less than the first loss value.
A second aspect of the present disclosure provides an image segmentation method, wherein a target image is input to a neural network, resulting in an image segmentation result; wherein the neural network is trained using the method according to the above.
A third aspect of the present disclosure provides a context coding-based neural network training apparatus, the neural network including a feature encoder module, a context information extraction module, and a feature decoder module, the apparatus including: the input module is configured to input a training data set and a verification data set to the feature coding module to obtain feature images, wherein the training data set and the verification data set respectively comprise a plurality of sample images, the sample images comprise positive samples and negative samples, and the training data set and the verification data set are provided with real labels; an extraction module configured to input the feature image to the context information extraction module, extract context semantic information, and generate a high-level feature map; a generation module configured to input the advanced feature map into the feature decoder module, generating a result of predictive segmentation of the sample image; and an updating module configured to determine a loss function of the neural network according to types of positive and negative samples in the sample image, calculate a loss value between a prediction segmentation result and the real label based on the loss function of the neural network, and update a parameter of the neural network according to the loss value; and the output module is configured to judge whether the iteration times of all the sample images reach a first threshold value, and output the trained neural network after the iteration times of all the sample images reach the first threshold value.
According to the embodiment of the disclosure, the neural network training device further comprises a preprocessing module, before inputting the training data set and the verification data set in the sample image to the feature coding module, the preprocessing module is configured to perform format conversion on the sample image to generate a target format image; zooming the target format image to generate a target size image; and carrying out normalization processing and binarization processing on the target size image.
According to the embodiment of the disclosure, the neural network training device further comprises a data enhancement module, after the sample image is preprocessed, the sample image is randomly divided into a training data set and a verification data set according to a set proportion, and the training data set is subjected to data enhancement; and the data enhancement module is configured to perform translation transformation and/or random cropping and/or contrast transformation on the image after binarization processing.
According to an embodiment of the present disclosure, the update module includes an update submodule configured to calculate a first loss value between a prediction segmentation result and the true label in a training process using the loss function obtained based on a training data set; calculating a second loss value between the prediction segmentation result and the real label in the verification process by using the loss function obtained based on the verification data set; updating a parameter of the neural network if the second loss value is less than the first loss value.
A fourth aspect of the present disclosure provides an electronic device, comprising: one or more processors; a storage device for storing executable instructions that, when executed by the processor, implement a neural network training method in accordance with the foregoing.
A fifth aspect of the present disclosure provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, implement a neural network training method in accordance with the foregoing.
A sixth aspect of the present disclosure provides a computer program product comprising a computer program which, when executed by a processor, implements a neural network training method in accordance with the above.
According to the embodiment of the disclosure, the neural network with the context information extraction module is constructed, and in the neural network training process, the loss function is determined based on the types of the positive samples and the negative samples in the sample data set, so that the neural network is trained by adopting different loss functions aiming at different sample types, the condition that the gradient of the neural network disappears aiming at different numbers of the positive samples and the negative samples can be effectively avoided, and the accuracy of network parameters in the neural network training process is effectively improved.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which proceeds with reference to the accompanying drawings, in which:
fig. 1 schematically shows a schematic diagram of a system architecture to which the neural network training method of the embodiments of the present disclosure may be applied;
FIG. 2 schematically illustrates a flow diagram of a neural network training method in accordance with an embodiment of the present disclosure;
FIG. 3 schematically illustrates a structural diagram of a multi-scale fusion module of a neural network according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates a flow diagram for preprocessing a sample image for a neural network, in accordance with an embodiment of the present disclosure;
FIG. 5 schematically illustrates a structural diagram of a dilation convolution module of a neural network in accordance with an embodiment of the present disclosure;
FIG. 6 schematically illustrates a structural schematic of a multi-scale pooling module of a neural network according to an embodiment of the present disclosure;
FIG. 7 schematically illustrates a structural diagram of a context coding based neural network according to an embodiment of the present disclosure;
fig. 8 schematically shows a block diagram of a training apparatus for a context coding based neural network according to an embodiment of the present disclosure; and
fig. 9 schematically shows a block diagram of an electronic device adapted for a neural network training method according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
With the rapid development of the internet financial industry, the requirements on the information security of users are continuously improved. In order to ensure the security of user information, more and more security technologies are applied to the field of information security technologies, such as face recognition technology. The face information has the characteristics of non-copying, simplicity, convenience, intuition and non-theft, and is the key point of the current safety technology application. However, in the process of identifying and checking the face information, the acquired face information needs to be processed, for example, image segmentation is performed on the face image, so as to achieve accurate comparison of the face information.
In the related technology, researchers apply a U-Net neural network model to a face segmentation scene, the model adopts an encoder-decoder structure and is also divided into a down-sampling stage and an up-sampling stage, the network structure only comprises a convolution layer and a pooling layer and does not comprise a full connection layer, a shallow high-resolution layer in the network is used for solving the problem of pixel positioning, and a deep layer is used for solving the problem of pixel classification, so that segmentation of image semantic levels can be realized. However, continuous merging operation or striding convolution operation in the encoder will cause some spatial information to be lost, and although such spatial invariance is beneficial in classification tasks and object detection tasks, it will often hinder intensive prediction tasks that require detailed spatial information, and limit the learning ability of the model for shape change, so that when the difference between segmented objects is large (for example, the angle of a human face is different, the skin color of each organ is different, and the shape is different), the boundary of the object region cannot be accurately drawn. In addition, the face segmentation task has the condition that the distribution of positive and negative samples is unbalanced, and the classification difficulty is different even for similar samples.
In view of this, an embodiment of the present disclosure provides a neural network training method and apparatus based on context coding, and an image segmentation method, where the neural network includes a feature encoder module, a context information extraction module, and a feature decoder module, and the neural network training method includes: inputting a training data set and a verification data set to a feature coding module to obtain feature images, wherein the training data set and the verification data set both comprise a plurality of sample images, the sample images comprise positive samples and negative samples, and the training data set and the verification data set are provided with real labels; inputting the feature image into a context information extraction module, extracting context semantic information, and generating high-level feature mapping; inputting the high-level feature mapping into a feature decoder module to generate a prediction segmentation result of the sample image; determining a loss function of the neural network according to the types of the positive samples and the negative samples in the sample image, calculating a loss value between the prediction segmentation result and the real label based on the loss function of the neural network, and updating parameters of the neural network according to the loss value; and judging whether the iteration times of all the sample images reach a first threshold value or not, and outputting the trained neural network after the iteration times of all the sample images reach the first threshold value.
According to the embodiment of the disclosure, the neural network with the context information extraction module is constructed, and in the neural network training process, the loss function is determined based on the types of the positive samples and the negative samples in the sample data set, so that the neural network is trained by adopting different loss functions aiming at different sample types, the condition that the gradient of the neural network disappears aiming at different numbers of the positive samples and the negative samples can be effectively avoided, and the accuracy of network parameters in the neural network training process is effectively improved.
Fig. 1 schematically shows a schematic diagram of a system architecture to which the neural network training method of the embodiments of the present disclosure may be applied. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios. It should be noted that the neural network training method and apparatus based on context coding provided by the embodiment of the present disclosure may be used in the related aspects of image processing in the technical field of artificial intelligence, the technical field of image processing, and the financial field, and may also be used in any field other than the financial field.
As shown in fig. 1, an exemplary system architecture 100 to which the neural network training method may be applied may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various image acquisition client applications, such as a photo application, a face image, etc. (for example only) can be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be various electronic devices having display screens and supporting face information acquisition, including but not limited to smart phones, smart televisions, tablet computers, laptop portable computers, desktop computers, and the like. The terminal devices 101, 102, and 103 may acquire face image information of the user or other people through face information acquisition, and transmit the face image information to the server 105 through the network 104.
The server 105 may be a server that provides various services, such as a background management server (for example only) that processes or stores pictures sent by users using the terminal devices 101, 102, 103. The background management server may perform processing such as analysis on the received user data (e.g., a character image), and feed back a processing result (e.g., a segmentation or recognition result after image processing) to the terminal device.
It should be noted that the training method of the neural network provided by the embodiment of the present disclosure may be generally performed by the terminal devices 101, 102, 103 or the server 105. Accordingly, the neural network training device provided by the embodiment of the present disclosure may be generally disposed in the terminal device 101, 102, 103 or the server 105. The neural network training method provided by the embodiment of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the neural network training device provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
The context coding-based neural network training method of the disclosed embodiments will be described in detail below with reference to fig. 2 to 7.
Fig. 2 schematically shows a flow diagram of a neural network training method according to an embodiment of the present disclosure. According to an embodiment of the present disclosure, the neural network includes a feature encoder module, a context information extraction module, and a feature decoder module.
As shown in fig. 2, the neural network training method 200 of the embodiment of the present disclosure includes operations S210 to S250.
In operation S210, a training data set and a verification data set are input to the feature coding module to obtain feature images, where the training data set and the verification data set both include a plurality of sample images, the sample images include positive samples and negative samples, and the training data set and the verification data set have real labels.
For example, a plurality of images to be segmented in a data set are processed to obtain a training data set and a verification data set, wherein the training data set is used for training a neural network, and the verification data set is used for verifying the trained neural network. The training data set and the verification data set have real labels, and the training result of the neural network can be judged according to the real labels, and the operation such as adjustment of parameters of the neural network can be carried out. In an embodiment of the present disclosure, positive and negative examples are included in the sample image. For example, the number of positive and negative examples in the sample image differs depending on the data set selected at each time.
In operation S220, the feature image is input to the context information extraction module, context semantic information is extracted, and a high-level feature map is generated.
For example, the context information extraction module is used to extract context semantic information in a picture to generate an advanced feature map.
In operation S230, the advanced feature map is input to the feature decoder module, and a result of predictive segmentation of the sample image is generated.
In operation S240, a loss function of the neural network is determined according to the types of the positive and negative samples in the sample image, a loss value between the prediction segmentation result and the real label is calculated based on the loss function of the neural network, and a parameter of the neural network is updated according to the loss value.
For example, the sample image includes positive samples and negative samples, for the positive samples, the loss function may be a dice loss function, and a loss value between the prediction segmentation result and the true label may be calculated. For the negative sample, if a dice loss function is selected, the dice coefficient of the negative sample is equal to 0, the loss value is equal to 1, the gradient is 0, the neural network model is not affected in the training process, and the target area cannot correctly classify each pixel point, so that the background area and the target area tend to be predicted as the background area. Therefore, by re-determining a new loss function different from the dice loss function based on the negative examples when the examples input to the neural network are negative examples. The method can effectively avoid the situation that the gradient is 0, improve the recognition accuracy of the background area and the target area in the neural network training process, and improve the calculation precision of the neural network training.
In an embodiment of the present disclosure, the feature encoder module and the feature decoder module each comprise a multi-level, multi-scale fusion module; the two adjacent stages of multi-scale fusion modules in the feature encoder module are connected through convolution operation with preset step length, and the convolution operation is used for feature adding operation between the two adjacent stages of multi-scale fusion modules; the multi-scale fusion modules of two adjacent stages in the feature decoder module are connected through a transposition convolution operation with a preset step size, and the convolution operation is used for feature adding operation among the feature decoder modules.
For example, the feature encoder module includes 4 multi-scale fusion modules, and a 3 × 3 convolution operation with a step size of 2 is performed between every two adjacent multi-scale fusion modules. The feature decoder module comprises 4 multi-scale fusion modules, and 3 × 3 transposition convolution operation with step size of 2 is performed between every two adjacent multi-scale fusion modules. The multi-scale fusion modules in the feature encoder module and the multi-scale fusion modules in the feature decoder module are in one-to-one correspondence, and feature adding operation is carried out on features among the multi-scale fusion modules.
Fig. 3 schematically shows a structural diagram of a multi-scale fusion module 300 of a neural network according to an embodiment of the present disclosure.
As shown in fig. 3, each stage of the multi-scale fusion module 300 includes: the method comprises the following steps that n first convolution layers from 1 st to n-1 st are used for extracting first features of an input image and splicing the extracted first features with the first features extracted by the n first convolution layers to obtain first complex features with different scales, wherein n is a natural number; the second convolution layer is used for extracting a second feature of the input image and adding the extracted second feature and the first complex feature to obtain a second complex feature; and the activation function layer and the normalization layer are used for processing the second complex features and outputting the processing result.
For example, the multi-scale fusion module includes 3 first convolution layers 302, 303, 304, where the first convolution layers may be a 3 × 3 convolution operation. The first convolution operation 302 and the second convolution operation 303 are respectively used for extracting a first feature of the input 301 image, and residual connection is added after the first convolution operation and the second convolution operation, that is, the first feature extracted by the first convolution operation 302 and the second convolution operation 303 is spliced with a feature obtained after the third convolution operation 304, so as to obtain first complex features with different scales. In order to keep the number of Input and output channels the same, we set the number of filters of three consecutive convolutional layers to (0.2, 0.3, 0.5) × Input, respectivelychannelWherein InputchannelIs the number of input channels.
For example, the second convolution layer 305 may be, for example, a 1 × 1 convolution operation for extracting the second feature of the input image 301, adding 1 × 1 convolution operation after the input, and adding the first complex feature obtained previously to generate the second complex feature.
For example, the activation function 306 of the present disclosure may be a relu activation function, and the Normalization process 307 may be, for example, Batch Normalization, and the output result is used as an input of the next convolution or transposed convolution after the Normalization process is performed, and is finally output through the output 308.
In operation S250, it is determined whether the number of iterations of all sample images reaches a first threshold, and the trained neural network is output after the first threshold is reached.
For example, the first threshold includes an epoch value, whether the number of iterations of all sample images in the data set reaches a preset epoch value is determined, and the training degree of the neural network can be determined according to the preset epoch value. The Epoch value may be, for example, a value between 100 and 200 times, for example, the Epoch value is set to 150 times, and when the Epoch value exceeds 150 times, the training is stopped, and the trained neural network is output. For another example, when the epoch value is determined not to reach the first threshold, the input sample image is continuously input for iteration until the epoch value reaches the first threshold, the training process is ended, and the trained neural network is output.
Fig. 4 schematically illustrates a flow diagram of preprocessing a sample image of a neural network according to an embodiment of the present disclosure.
In an embodiment of the present disclosure, before inputting the training data set and the verification data set in the sample image to the feature encoding module, the method further includes preprocessing the sample image, where the preprocessing includes: carrying out format conversion on the sample image to generate a target format image; carrying out image scaling on the target format image to generate a target size image; and carrying out normalization processing and binarization processing on the target size image.
As shown in fig. 4, the flow 400 of preprocessing the sample image includes operations S410 to S430.
In operation S410, the sample image is format-converted, generating a target format image.
For example, when sample image collection is performed, formats of sample images are different, and the sample images input to the neural network need to satisfy a specific format, the sample images are subjected to format conversion as needed to generate target format images that satisfy the requirements.
In operation S420, the target format image is image-scaled, generating a target size image.
For example, the target format image is scaled to generate an image that meets the required size. For example, pixel values of an image are set to 256 × 256.
In operation S430, the target size image is subjected to normalization processing and binarization processing.
For example, the scaled target size image is normalized to the [0, 1] section. And then carrying out binarization processing on the normalized image. For example, a threshold value of 0.5 is set, pixel values larger than the threshold value are processed to be 1, and pixel values smaller than the threshold value are processed to be 0.
According to the embodiment of the disclosure, after the sample image is preprocessed, the method further comprises the steps of randomly dividing the sample image into a training data set and a verification data set according to a set proportion, and performing data enhancement on the training data set. The data enhancement comprises at least one of translation transformation, random cutting and contrast transformation of the image after the binarization processing.
For example, the sample image is randomly divided into a training data set and a validation data set at a set scale. For example, randomly divided according to a ratio of 7: 3, one part is a training set, and the other part is a verification set.
In the embodiment of the disclosure, data enhancement is performed through the training data set, so that the training data volume can be increased, and the neural network obtained through training has higher accuracy.
For example, the data enhancement may be a shift transform (shift), i.e. a shift of the original picture in some way (the step size, range and direction of the shift being determined in a predefined or random manner) within the image plane.
The data enhancement may also be Random Crop (Random Crop), i.e. randomly defining the region of interest to Crop the image, corresponding to adding Random perturbations.
The data enhancement may also be a Contrast transform (Contrast), i.e. changing the image Contrast, which is equivalent to keeping the hue component H constant and changing the brightness component V and saturation S in HSV space, for simulating the illumination change of the real environment.
In an embodiment of the present disclosure, determining the loss function of the neural network according to the types of the positive and negative examples in the sample image includes: in the case where the sample image of the input neural network is determined to be a positive sample, determining the loss function is calculated by the following formula:
Figure BDA0003199644080000131
wherein G is a real label, and P is a prediction result.
For example, in the case of determining that the sample image input to the neural network is a positive sample, the loss function is an existing calculation formula of the loss function.
In an embodiment of the present disclosure, determining the loss function of the neural network according to the types of the positive and negative examples in the example image further comprises: in the case where it is determined that the sample image input to the neural network is a negative sample, taking the L1 norm of the element in the neural network model that outputs an output greater than the second threshold as a new loss function, the loss function is calculated by the following formula:
Figure BDA0003199644080000132
wherein G is a real label, P is a prediction result, and N represents the number of samples when the activation value corresponding to the negative sample in the training data set is greater than the second threshold.
For example, the second threshold value may range between 0 and 1, for example preferably 0.001,
according to the embodiment of the disclosure, different loss functions are determined by determining different types of samples input into the neural network, so that the situation that the gradient is 0 when the positive and negative samples are inconsistent can be avoided, the recognition accuracy of the background area and the target area in the neural network training process is improved, and the calculation accuracy of the neural network training is improved.
In an embodiment of the present disclosure, calculating a loss value between the prediction segmentation result and the real label based on a loss function of the neural network, and updating a parameter of the neural network according to the loss value includes: calculating a first loss value between a prediction segmentation result and a real label in a training process by using a loss function obtained based on a training data set; calculating a second loss value between the prediction segmentation result and the real label in the verification process by using a loss function obtained based on the verification data set; updating a parameter of the neural network if the second loss value is less than the first loss value.
For example, the first loss value is a loss value calculated during training, and the second loss value is a loss value calculated during validation. If the second loss value is greater than the first loss value, the parameters of the neural network in the training process of the neural network reach the expected result. Operation S250 of the disclosed embodiment is performed. If the second loss value is smaller than the first loss value, it indicates that the parameters of the neural network during the verification process meet the requirements, the currently trained neural network model is saved to update the network parameters, and then operation S250 of the embodiment of the present disclosure is performed.
In the embodiment of the disclosure, the context information extraction module of the neural network is composed of a dense expansion convolution module and a residual multi-scale pooling module.
Fig. 5 schematically shows a structural diagram of a dilation convolution module of a neural network according to an embodiment of the present disclosure.
As shown in fig. 5, the dense dilation-convolution modules are stacked in a cascading manner, and there are four cascading branches, wherein the sampling rate of dilation-convolution increases from 1 to 1, 3, and 5, and the receptive fields corresponding to each branch are 3, 7, 9, and 19, respectively, so as to capture feature information from multiple scales. The linear correction is performed through a 1 x 1 convolutional layer at the end of each convolutional branch of dilation. And finally, adding the original characteristics and the output characteristics of other parallel branches position by using a shortcut mechanism in ResNet.
Fig. 6 schematically illustrates a structural schematic diagram of a multi-scale pooling module of a neural network according to an embodiment of the present disclosure.
As shown in fig. 6, the multi-scale pooling module encodes global context information using four fields of view of different sizes, 2 × 2, 3 × 3, 5 × 5 and 6 × 6, respectively, with the four levels of output containing various sizes of feature maps. Furthermore, to reduce the dimensionality and computational cost of the weights, a 1 × 1 convolution is used after each pooling level to reduce the number of channels for the corresponding level to 1/M of the original input, where M represents the number of channels in the original input. And then, the low-dimensional feature map output by each level is up-sampled to the size of the original input feature map by a bilinear interpolation method. Finally, the original features are spliced with up-sampled feature maps of several different levels.
Fig. 7 schematically shows a structural diagram of a context coding-based neural network according to an embodiment of the present disclosure.
As shown in fig. 7, the neural network includes a feature encoder module, a context information extraction module, and a feature decoder module. The feature encoder module and the feature decoder module respectively comprise 4 multi-scale fusion modules, the multi-scale fusion modules of the feature encoder module and the multi-scale fusion modules of the feature decoder module are in one-to-one correspondence, and features are added mutually. Wherein each multi-scale fusion module in the feature encoder modules is connected by a convolution of 3 x 3 with a step size of 2. Each multi-scale fusion module in the feature decoder is connected by a 3 x 3 transposed convolution of step size 2.
In the embodiment of the present disclosure, in the training process of the neural network, a plurality of sample images need to be preprocessed, for example, format conversion, image scaling, normalization, and binarization as described above. And then, randomly dividing the sample image into a training data set and a verification data set according to a set proportion, and performing data enhancement on the training data set. Next, the neural network is trained using the training data set, followed by validating the neural network using the validation data set. For example, the convolution kernel weight and loss function of the neural network are initialized to 0 first; inputting a training data set into a context-coding based neural network; calculating the training data set and parameters of each node in the neural network based on the context coding to realize forward propagation of network training; calculating the difference between the prediction segmentation result output by the forward propagation of the neural network and the real label, and calculating a loss function; for positive samples, the loss function is calculated using equation (1) above, and for negative samples, the loss function is calculated using equation (2) above. Next, the verification data set is input to a neural network for verification. In the verification process, based on the same method for determining the loss function according to the types of the positive and negative samples, the loss value between the prediction segmentation result and the real label in the verification process is calculated. And judging whether the loss value in the verification process is smaller than the minimum loss value in the previous verification process, and if the loss value in the verification process is smaller than the minimum loss value in the previous verification process, saving the currently trained neural network parameters. And then, judging whether the current iteration number reaches a preset epoch value, if not, continuing to perform next iteration, and if so, finishing the training of the surface neural network and outputting the trained neural network.
In the embodiment of the present disclosure, there is also provided an image segmentation method, including inputting a target image into a neural network, obtaining an image segmentation result; wherein the neural network is trained using the training method described above.
According to the embodiment of the disclosure, the trained neural network is used for image segmentation, for example, image segmentation in a face recognition technology, because the face angles are different or the backgrounds are different, the obtained face images have large difference, and the distribution of positive and negative samples is unbalanced in the face image segmentation process, so that the neural network trained by the training method is used for segmenting the face images, and the accuracy of image segmentation can be improved.
Fig. 8 schematically shows a block diagram of a training apparatus for a neural network based on context coding according to an embodiment of the present disclosure.
As shown in fig. 8, the neural network training device 800 of the embodiment of the present disclosure includes an input module 810, an extraction module 820, a generation module 830, an update module 840, and an output module 850.
The input module 810 is configured to input a training data set and a verification data set to the feature coding module to obtain a feature image, where the training data set and the verification data set both include a plurality of sample images, the sample images include positive samples and negative samples, and the training data set and the verification data set have real labels. In an embodiment, the input module 810 may be configured to perform the operation S210 described above, which is not described herein again.
The extraction module 820 is configured to input the feature images to a contextual information extraction module, extract contextual semantic information, and generate advanced feature maps. In an embodiment, the extracting module 820 may be configured to perform the operation S220 described above, which is not described herein again.
The generation module 830 is configured to input the advanced feature map to the feature decoder module, generating a result of predictive segmentation of the sample image. In an embodiment, the generating module 830 may be configured to perform the operation S230 described above, and is not described herein again.
The update module 840 is configured to determine a loss function of the neural network according to the types of the positive and negative samples in the sample image, calculate a loss value between the prediction segmentation result and the real label based on the loss function of the neural network, and update a parameter of the neural network according to the loss value. In an embodiment, the update module 840 may be configured to perform the operation S240 described above, which is not described herein again.
The output module 850 is configured to determine whether the number of iterations of all sample images reaches a first threshold, and output the trained neural network after reaching the first threshold. In an embodiment, the output module 850 may be configured to perform the operation S250 described above, which is not described herein again.
In an embodiment of the present disclosure, the neural network training apparatus further includes a preprocessing module, before inputting the training data set and the verification data set in the sample image to the feature coding module, further includes preprocessing the sample image, and the preprocessing module is configured to perform format conversion on the sample image to generate a target format image; carrying out image scaling on the target format image to generate a target size image; and carrying out normalization processing and binarization processing on the target size image.
In the embodiment of the disclosure, the neural network training device further comprises a data enhancement module, after preprocessing the sample image, randomly dividing the sample image into a training data set and a verification data set according to a set proportion, and performing data enhancement on the training data set; and the data enhancement module is configured to perform translation transformation and/or random cropping and/or contrast transformation on the image after binarization processing.
In an embodiment of the disclosure, the update module includes an update submodule configured to calculate a first loss value between a prediction segmentation result and a true label in a training process using a loss function obtained based on a training data set; calculating a second loss value between the prediction segmentation result and the real label in the verification process by using a loss function obtained based on the verification data set; in the case that the second loss value is smaller than the first loss value, the parameters of the neural network are updated.
According to the embodiment of the present disclosure, any plurality of the input module 810, the extraction module 820, the generation module 830, the update module 840, the output module 850, the preprocessing module, the data enhancement module, and the update sub-module may be combined into one module to be implemented, or any one of the modules may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the input module 810, the extraction module 820, the generation module 830, the update module 840, the output module 850, the preprocessing module, the data enhancement module, and the update sub-module may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in any one of three implementations of software, hardware, and firmware, or in any suitable combination of any of them. Alternatively, at least one of the input module 810, the extraction module 820, the generation module 830, the update module 840, the output module 850, the preprocessing module, the data enhancement module, and the update sub-module may be at least partially implemented as a computer program module that, when executed, may perform a corresponding function.
Fig. 9 schematically illustrates a block diagram of an electronic device adapted to implement a neural network training method in accordance with an embodiment of the present disclosure. The electronic device shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 9, an electronic apparatus 900 according to an embodiment of the present disclosure includes a processor 901 which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage portion 908 into a Random Access Memory (RAM) 903. Processor 901 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 901 may also include on-board memory for caching purposes. The processor 901 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.
In the RAM 903, various programs and data necessary for the operation of the electronic apparatus 900 are stored. The processor 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904. The processor 901 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 902 and/or the RAM 903. Note that the programs may also be stored in one or more memories other than the ROM 902 and the RAM 903. The processor 901 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.
Electronic device 900 may also include input/output (I/O) interface 905, input/output (I/O) interface 905 also connected to bus 904, according to an embodiment of the present disclosure. The electronic device 900 may also include one or more of the following components connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer readable storage medium carries one or more programs which, when executed, implement a neural network training method in accordance with an embodiment of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 902 and/or the RAM 903 described above and/or one or more memories other than the ROM 902 and the RAM 903.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flow chart. When the computer program product runs in a computer system, the program code is used for causing the computer system to realize the neural network training method provided by the embodiment of the disclosure.
The computer program performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure when executed by the processor 901. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed in the form of a signal on a network medium, and downloaded and installed through the communication section 909 and/or installed from the removable medium 911. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The computer program, when executed by the processor 901, performs the above-described functions defined in the system of the embodiment of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims (13)

1.一种基于上下文编码的神经网络训练方法,所述神经网络包括特征编码器模块、上下文信息提取模块以及特征解码器模块,所述方法包括:1. a neural network training method based on context coding, the neural network comprises a feature encoder module, a context information extraction module and a feature decoder module, the method comprises: 将训练数据集和验证数据集输入至所述特征编码模块,得到特征图像,所述训练数据集和验证数据集均包括多个样本图像,所述样本图像包括正样本和负样本,所述训练数据集和所述验证数据集具有真实标签;Input the training data set and the verification data set into the feature encoding module to obtain a feature image, the training data set and the verification data set both include a plurality of sample images, and the sample images include positive samples and negative samples, and the training the dataset and the validation dataset have ground truth labels; 将所述特征图像输入至所述上下文信息提取模块,提取上下文语义信息,并生成高级特征映射;Inputting the feature image to the context information extraction module, extracting contextual semantic information, and generating a high-level feature map; 将所述高级特征映射输入所述特征解码器模块,生成所述样本图像的预测分割结果;inputting the high-level feature map into the feature decoder module to generate a predicted segmentation result of the sample image; 根据所述样本图像中的正样本和负样本的类型确定所述神经网络的损失函数,基于所述神经网络的损失函数计算预测分割结果和所述真实标签之间的损失值,并根据所述损失值更新所述神经网络的参数;以及The loss function of the neural network is determined according to the types of positive samples and negative samples in the sample image, the loss value between the predicted segmentation result and the real label is calculated based on the loss function of the neural network, and the loss value between the predicted segmentation result and the real label is calculated according to the the loss value updates the parameters of the neural network; and 判断所有样本图像的迭代次数是否达到第一阈值,在达到第一阈值后输出完成训练后的所述神经网络。It is judged whether the number of iterations of all sample images reaches a first threshold, and after reaching the first threshold, the neural network after training is output. 2.根据权利要求1所述的神经网络训练方法,其中,所述特征编码器模块和所述特征解码器模块均包含多级多尺度融合模块;2. The neural network training method according to claim 1, wherein the feature encoder module and the feature decoder module both comprise multi-level and multi-scale fusion modules; 所述特征编码器模块中相邻两级多尺度融合模块通过预定步长的卷积操作连接,所述卷积操作用于相互之间的特征添加操作;In the feature encoder module, the adjacent two-level multi-scale fusion modules are connected by a convolution operation of a predetermined step size, and the convolution operation is used for mutual feature addition operations; 所述特征解码器模块中相邻两级的多尺度融合模块通过所述预定步长的转置卷积操作连接,所述卷积操作用于相互之间的特征添加操作。The multi-scale fusion modules of two adjacent stages in the feature decoder module are connected by the transposed convolution operation of the predetermined step size, and the convolution operation is used for the mutual feature addition operation. 3.根据权利要求2所述的神经网络训练方法,其中,所述多级多尺度融合模块中的每级多尺度融合模块包括:3. The neural network training method according to claim 2, wherein each level of the multi-scale fusion module in the multi-level multi-scale fusion module comprises: n个第一卷积层,第1个至第n-1个第一卷积层用于提取输入图像的第一特征,并将提取的第一特征与第n个第一卷积层提取的第一特征拼接,以获得从不同尺度的第一复杂特征,n为自然数;n first convolutional layers, the 1st to n-1th first convolutional layers are used to extract the first features of the input image, and the extracted first features are The first feature is spliced to obtain the first complex feature from different scales, and n is a natural number; 第二卷积层,所述第二卷积层用于提取输入图像的第二特征,并将提取的第二特征与所述第一复杂特征相加,得到生成第二复杂特征;a second convolutional layer, the second convolutional layer is used to extract the second feature of the input image, and add the extracted second feature to the first complex feature to generate a second complex feature; 激活函数层及归一化层,用于对所述第二复杂特征进行处理,并将处理结果输出。The activation function layer and the normalization layer are used for processing the second complex feature and outputting the processing result. 4.根据权利要求1所述的神经网络训练方法,其中,在将训练数据集和验证数据集输入至所述特征编码模块之前,还包括对所述样本图像进行预处理,所述预处理包括:4. The neural network training method according to claim 1, wherein before inputting the training data set and the verification data set into the feature encoding module, it further comprises preprocessing the sample images, the preprocessing comprising : 对所述样本图像进行格式转换,生成目标格式图像;Perform format conversion on the sample image to generate a target format image; 对所述目标格式图像进行图像缩放,生成目标大小图像;performing image scaling on the target format image to generate a target size image; 对所述目标大小图像进行归一化处理以及二值化处理。Normalize and binarize the target size image. 5.根据权利要求4所述的神经网络训练方法,其中,在对所述样本图像进行预处理之后,还包括将所述样本图像按照设定比例随机划分为训练数据集和验证数据集,并对所述训练数据集进行数据增强;5. The neural network training method according to claim 4, wherein after preprocessing the sample image, it further comprises randomly dividing the sample image into a training data set and a verification data set according to a set ratio, and performing data augmentation on the training data set; 其中,所述数据增强包括对二值化处理后的图像进行平移变换、随机裁剪、对比度变换中的至少一种。Wherein, the data enhancement includes performing at least one of translation transformation, random cropping, and contrast transformation on the binarized image. 6.根据权利要求1所述的神经网络训练方法,其中,所述根据所述样本图像中的正样本和负样本的类型确定所述神经网络的损失函数包括:6. The neural network training method according to claim 1, wherein the determining the loss function of the neural network according to the types of positive samples and negative samples in the sample image comprises: 在确定输入所述神经网络的样本图像为所述正样本的情况下,确定所述损失函数通过以下公式计算:When it is determined that the sample image input to the neural network is the positive sample, it is determined that the loss function is calculated by the following formula:
Figure FDA0003199644070000021
Figure FDA0003199644070000021
其中,G为真实标签,P为预测结果。Among them, G is the real label, and P is the prediction result.
7.根据权利要求6所述的神经网络训练方法,其中,所述根据所述样本图像中的正样本和负样本的类型确定所述神经网络的损失函数还包括:7. The neural network training method according to claim 6, wherein the determining the loss function of the neural network according to the types of positive samples and negative samples in the sample image further comprises: 在确定输入所述神经网络的样本图像为所述负样本的情况下,通过以下公式计算所述损失函数:When it is determined that the sample image input to the neural network is the negative sample, the loss function is calculated by the following formula:
Figure FDA0003199644070000031
Figure FDA0003199644070000031
其中,G为真实标签,P为预测结果,N表示当训练数据集中负样本对应的激活值大于第二阈值的样本个数。Among them, G is the real label, P is the prediction result, and N is the number of samples whose activation value corresponding to the negative sample in the training data set is greater than the second threshold.
8.根据权利要求7所述的神经网络训练方法,其中,所述基于所述神经网络的损失函数计算预测分割结果和所述真实标签之间的损失值,并根据所述损失值更新所述神经网络的参数包括:8. The neural network training method according to claim 7, wherein the loss value between the predicted segmentation result and the true label is calculated based on the loss function of the neural network, and the loss value is updated according to the loss value. The parameters of the neural network include: 使用基于训练数据集得到的所述损失函数,计算训练过程中的预测分割结果和所述真实标签之间的第一损失值;Using the loss function obtained based on the training data set, calculate the first loss value between the predicted segmentation result in the training process and the true label; 使用基于验证数据集得到的所述损失函数,计算验证过程中的预测分割结果和所述真实标签之间的第二损失值;Using the loss function obtained based on the verification data set, calculate the second loss value between the predicted segmentation result in the verification process and the true label; 在所述第二损失值小于所述第一损失值的情况下,更新所述神经网络的参数。When the second loss value is smaller than the first loss value, the parameters of the neural network are updated. 9.一种图像分割方法,其中,9. An image segmentation method, wherein, 将目标图像输入神经网络,得到图像分割结果;Input the target image into the neural network to obtain the image segmentation result; 其中,所述神经网络是使用根据权利要求1至8中任一项所述的方法训练的。wherein the neural network is trained using the method of any one of claims 1 to 8. 10.一种基于上下文编码的神经网络训练装置,所述神经网络包括特征编码器模块、上下文信息提取模块以及特征解码器模块,所述装置包括:10. A contextual coding-based neural network training device, the neural network comprising a feature encoder module, a context information extraction module and a feature decoder module, the device comprising: 输入模块,配置为将训练数据集和验证数据集输入至所述特征编码模块,得到特征图像,所述训练数据集和验证数据集均包括多个样本图像,所述样本图像包括正样本和负样本,所述训练数据集和所述验证数据集具有真实标签;The input module is configured to input the training data set and the verification data set into the feature encoding module to obtain the feature image, the training data set and the verification data set each include a plurality of sample images, and the sample images include positive samples and negative samples samples, the training data set and the validation data set have real labels; 提取模块,配置为将所述特征图像输入至所述上下文信息提取模块,提取上下文语义信息,并生成高级特征映射;an extraction module, configured to input the feature image into the context information extraction module, extract context semantic information, and generate an advanced feature map; 生成模块,配置为将所述高级特征映射输入所述特征解码器模块,生成所述样本图像的预测分割结果;以及a generation module configured to input the high-level feature map into the feature decoder module to generate a predicted segmentation result of the sample image; and 更新模块,配置为根据所述样本图像中的正样本和负样本的类型确定所述神经网络的损失函数,基于所述神经网络的损失函数计算预测分割结果和所述真实标签之间的损失值,并根据所述损失值更新所述神经网络的参数;an update module, configured to determine the loss function of the neural network according to the types of positive samples and negative samples in the sample image, and calculate the loss value between the predicted segmentation result and the true label based on the loss function of the neural network , and update the parameters of the neural network according to the loss value; 输出模块,配置为判断所有样本图像的迭代次数是否达到第一阈值,在达到第一阈值后输出完成训练后的所述神经网络。The output module is configured to judge whether the number of iterations of all the sample images reaches a first threshold, and output the neural network after the training is completed after reaching the first threshold. 11.一种电子设备,包括:11. An electronic device comprising: 一个或多个处理器;one or more processors; 存储装置,用于存储可执行指令,所述可执行指令在被所述处理器执行时,实现根据权利要求1至8中任一项所述的神经网络训练方法。A storage device for storing executable instructions, which, when executed by the processor, implement the neural network training method according to any one of claims 1 to 8. 12.一种计算机可读存储介质,其上存储有可执行指令,该指令被处理器执行时,实现根据权利要求1至8中任一项所述的神经网络训练方法。12. A computer-readable storage medium having executable instructions stored thereon, the instructions, when executed by a processor, implement the neural network training method according to any one of claims 1 to 8. 13.一种计算机程序产品,包括计算机程序,所述计算机程序被处理执行时实现根据权利要求1至8中任一项所述的神经网络训练方法。13. A computer program product comprising a computer program which, when executed, implements the neural network training method according to any one of claims 1 to 8.
CN202110905429.5A 2021-08-06 2021-08-06 Neural network training method and device and image segmentation method Pending CN113822428A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110905429.5A CN113822428A (en) 2021-08-06 2021-08-06 Neural network training method and device and image segmentation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110905429.5A CN113822428A (en) 2021-08-06 2021-08-06 Neural network training method and device and image segmentation method

Publications (1)

Publication Number Publication Date
CN113822428A true CN113822428A (en) 2021-12-21

Family

ID=78912983

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110905429.5A Pending CN113822428A (en) 2021-08-06 2021-08-06 Neural network training method and device and image segmentation method

Country Status (1)

Country Link
CN (1) CN113822428A (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114359303A (en) * 2021-12-28 2022-04-15 浙江大华技术股份有限公司 Image segmentation method and device
CN114399766A (en) * 2022-01-18 2022-04-26 平安科技(深圳)有限公司 Optical character recognition model training method, device, equipment and medium
CN114492657A (en) * 2022-02-09 2022-05-13 深延科技(北京)有限公司 Plant disease classification method, device, electronic device and storage medium
CN114511478A (en) * 2022-01-18 2022-05-17 北京世纪好未来教育科技有限公司 Image processing method and device, electronic equipment and storage medium
CN114565615A (en) * 2022-02-18 2022-05-31 新疆大学 Polyp image segmentation method, device, computer equipment and storage medium
CN114662588A (en) * 2022-03-21 2022-06-24 合肥工业大学 A method, system, device and storage medium for automatically updating a model
CN114970817A (en) * 2022-05-18 2022-08-30 北京百度网讯科技有限公司 Neural network training method and device and electronic equipment
CN115115828A (en) * 2022-04-29 2022-09-27 腾讯医疗健康(深圳)有限公司 Data processing method, apparatus, program product, computer equipment and medium
CN115331082A (en) * 2022-10-13 2022-11-11 天津大学 Path generation method of tracking sound source, training method of model and electronic equipment
CN115481694A (en) * 2022-09-26 2022-12-16 南京星环智能科技有限公司 Data enhancement method, device, equipment and storage medium for training sample set
CN115564966A (en) * 2022-10-17 2023-01-03 浙江网商银行股份有限公司 Image processing model training method and device
CN115496989B (en) * 2022-11-17 2023-04-07 南京硅基智能科技有限公司 Generator, generator training method and method for avoiding image coordinate adhesion
CN115984302A (en) * 2022-12-19 2023-04-18 中国科学院空天信息创新研究院 Multi-mode remote sensing image processing method based on sparse mixed expert network pre-training
WO2023123926A1 (en) * 2021-12-28 2023-07-06 苏州浪潮智能科技有限公司 Artificial intelligence task processing method and apparatus, electronic device, and readable storage medium
CN116434007A (en) * 2023-03-31 2023-07-14 中信银行股份有限公司 A target detection model training method and system for small-scale images
WO2023134550A1 (en) * 2022-01-14 2023-07-20 北京有竹居网络技术有限公司 Feature encoding model generation method, audio determination method, and related device
CN116523028A (en) * 2023-06-29 2023-08-01 深圳须弥云图空间科技有限公司 Image characterization model training method and device based on image space position
CN117036355A (en) * 2023-10-10 2023-11-10 湖南大学 Encoder and model training method, fault detection method and related equipment
CN117830645A (en) * 2024-02-23 2024-04-05 中国科学院空天信息创新研究院 Feature extraction network training method, device, equipment and medium
CN117853923A (en) * 2024-01-17 2024-04-09 山东盛然电力科技有限公司 Power grid power infrastructure safety evaluation analysis method and device
CN118552136A (en) * 2024-07-26 2024-08-27 浪潮智慧供应链科技(山东)有限公司 Big data-based supply chain intelligent inventory management system and method
WO2025139962A1 (en) * 2023-12-27 2025-07-03 苏州镁伽科技有限公司 Image processing model training method, and image processing method and apparatus

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109961032A (en) * 2019-03-18 2019-07-02 北京字节跳动网络技术有限公司 Method and apparatus for generating disaggregated model
CN111325750A (en) * 2020-02-25 2020-06-23 西安交通大学 A medical image segmentation method based on multi-scale fusion U-chain neural network
CN112183258A (en) * 2020-09-16 2021-01-05 太原理工大学 A Road Segmentation Method Based on Context Information and Attention Mechanism in Remote Sensing Image
CN112364699A (en) * 2020-10-14 2021-02-12 珠海欧比特宇航科技股份有限公司 Remote sensing image segmentation method, device and medium based on weighted loss fusion network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109961032A (en) * 2019-03-18 2019-07-02 北京字节跳动网络技术有限公司 Method and apparatus for generating disaggregated model
CN111325750A (en) * 2020-02-25 2020-06-23 西安交通大学 A medical image segmentation method based on multi-scale fusion U-chain neural network
CN112183258A (en) * 2020-09-16 2021-01-05 太原理工大学 A Road Segmentation Method Based on Context Information and Attention Mechanism in Remote Sensing Image
CN112364699A (en) * 2020-10-14 2021-02-12 珠海欧比特宇航科技股份有限公司 Remote sensing image segmentation method, device and medium based on weighted loss fusion network

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023123926A1 (en) * 2021-12-28 2023-07-06 苏州浪潮智能科技有限公司 Artificial intelligence task processing method and apparatus, electronic device, and readable storage medium
CN114359303B (en) * 2021-12-28 2024-12-24 浙江大华技术股份有限公司 Image segmentation method and device
CN114359303A (en) * 2021-12-28 2022-04-15 浙江大华技术股份有限公司 Image segmentation method and device
WO2023134550A1 (en) * 2022-01-14 2023-07-20 北京有竹居网络技术有限公司 Feature encoding model generation method, audio determination method, and related device
CN114399766A (en) * 2022-01-18 2022-04-26 平安科技(深圳)有限公司 Optical character recognition model training method, device, equipment and medium
CN114511478A (en) * 2022-01-18 2022-05-17 北京世纪好未来教育科技有限公司 Image processing method and device, electronic equipment and storage medium
CN114399766B (en) * 2022-01-18 2024-05-10 平安科技(深圳)有限公司 Optical character recognition model training method, device, equipment and medium
CN114492657A (en) * 2022-02-09 2022-05-13 深延科技(北京)有限公司 Plant disease classification method, device, electronic device and storage medium
CN114565615A (en) * 2022-02-18 2022-05-31 新疆大学 Polyp image segmentation method, device, computer equipment and storage medium
CN114662588A (en) * 2022-03-21 2022-06-24 合肥工业大学 A method, system, device and storage medium for automatically updating a model
CN114662588B (en) * 2022-03-21 2023-11-07 合肥工业大学 Method, system, equipment and storage medium for automatically updating model
CN115115828A (en) * 2022-04-29 2022-09-27 腾讯医疗健康(深圳)有限公司 Data processing method, apparatus, program product, computer equipment and medium
CN114970817A (en) * 2022-05-18 2022-08-30 北京百度网讯科技有限公司 Neural network training method and device and electronic equipment
CN115481694B (en) * 2022-09-26 2023-09-05 南京星环智能科技有限公司 Data enhancement method, device and equipment for training sample set and storage medium
CN115481694A (en) * 2022-09-26 2022-12-16 南京星环智能科技有限公司 Data enhancement method, device, equipment and storage medium for training sample set
CN115331082A (en) * 2022-10-13 2022-11-11 天津大学 Path generation method of tracking sound source, training method of model and electronic equipment
CN115331082B (en) * 2022-10-13 2023-02-03 天津大学 Path generation method of tracking sound source, training method of model and electronic equipment
CN115564966A (en) * 2022-10-17 2023-01-03 浙江网商银行股份有限公司 Image processing model training method and device
CN115496989B (en) * 2022-11-17 2023-04-07 南京硅基智能科技有限公司 Generator, generator training method and method for avoiding image coordinate adhesion
US12056903B2 (en) 2022-11-17 2024-08-06 Nanjing Silicon Intelligence Technology Co., Ltd. Generator, generator training method, and method for avoiding image coordinate adhesion
CN115984302A (en) * 2022-12-19 2023-04-18 中国科学院空天信息创新研究院 Multi-mode remote sensing image processing method based on sparse mixed expert network pre-training
CN115984302B (en) * 2022-12-19 2023-06-06 中国科学院空天信息创新研究院 Multimodal remote sensing image processing method based on sparse mixed expert network pre-training
CN116434007A (en) * 2023-03-31 2023-07-14 中信银行股份有限公司 A target detection model training method and system for small-scale images
CN116523028B (en) * 2023-06-29 2023-10-03 深圳须弥云图空间科技有限公司 Image characterization model training method and device based on image space position
CN116523028A (en) * 2023-06-29 2023-08-01 深圳须弥云图空间科技有限公司 Image characterization model training method and device based on image space position
CN117036355B (en) * 2023-10-10 2023-12-15 湖南大学 Encoder and model training method, fault detection method and related equipment
CN117036355A (en) * 2023-10-10 2023-11-10 湖南大学 Encoder and model training method, fault detection method and related equipment
WO2025139962A1 (en) * 2023-12-27 2025-07-03 苏州镁伽科技有限公司 Image processing model training method, and image processing method and apparatus
CN117853923A (en) * 2024-01-17 2024-04-09 山东盛然电力科技有限公司 Power grid power infrastructure safety evaluation analysis method and device
CN117830645A (en) * 2024-02-23 2024-04-05 中国科学院空天信息创新研究院 Feature extraction network training method, device, equipment and medium
CN118552136A (en) * 2024-07-26 2024-08-27 浪潮智慧供应链科技(山东)有限公司 Big data-based supply chain intelligent inventory management system and method

Similar Documents

Publication Publication Date Title
CN113822428A (en) Neural network training method and device and image segmentation method
US11775574B2 (en) Method and apparatus for visual question answering, computer device and medium
US11768876B2 (en) Method and device for visual question answering, computer apparatus and medium
CN111369581B (en) Image processing method, device, equipment and storage medium
CN109325972B (en) Laser radar sparse depth map processing method, device, equipment and medium
CN113469088B (en) SAR image ship target detection method and system under passive interference scene
WO2022105125A1 (en) Image segmentation method and apparatus, computer device, and storage medium
CN112016569B (en) Attention mechanism-based object detection method, network, device and storage medium
CN114611672B (en) Model training method, face recognition method and device
CN110929780A (en) Video classification model construction method, video classification device, video classification equipment and media
US11281928B1 (en) Querying semantic data from unstructured documents
CN111476719A (en) Image processing method, image processing device, computer equipment and storage medium
CN111311480B (en) Image fusion method and device
US12148131B2 (en) Generating an inpainted image from a masked image using a patch-based encoder
CN113408507B (en) Named Entity Recognition Method, Device and Electronic Device Based on History File
CN115115910A (en) Training method, usage method, device, equipment and medium of image processing model
CN113343981A (en) Visual feature enhanced character recognition method, device and equipment
CN117612188A (en) Text recognition model training method, text recognition device and equipment
CN117216715A (en) Data processing method, training method and device for deep learning model
CN108257081B (en) Method and device for generating pictures
CN119646255A (en) Cross-modal retrieval model training method and remote sensing image text retrieval method
CN113553386A (en) Embedding representation model training method, question answering method and device based on knowledge graph
CN112465737A (en) Image processing model training method, image processing method and image processing device
US20250131756A1 (en) Text recognition method and apparatus, storage medium and electronic device
CN114529750B (en) Image classification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20211221

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载