CN115577768A

CN115577768A - Semi-supervised model training method and device

Info

Publication number: CN115577768A
Application number: CN202211215544.0A
Authority: CN
Inventors: 高彬; 金欢; 赵越; 江立辉; 张洪波
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2022-09-30
Filing date: 2022-09-30
Publication date: 2023-01-06

Abstract

The method comprises the steps of performing cross-training by adopting a first model and a second model with the same structure based on labeled data and label-free data processed by different data enhancement methods, determining unsupervised loss according to the consistency of the first model and the second model on prediction results of label-free loss processed by different enhancement methods, realizing noise weighting and filtering, inhibiting noise influence caused by pseudo labels in the semi-supervised model training process, and improving the prediction precision and generalization of the models.

Description

Semi-supervised model training method and device

技术领域technical field

本申请涉及人工智能领域，尤其涉及一种半监督模型训练方法和装置。The present application relates to the field of artificial intelligence, in particular to a semi-supervised model training method and device.

背景技术Background technique

半监督(Semi-supervised)学习指的是使用有标签样本和无标签样本对深度学习网络模型(本申请也简称为“模型”)进行训练的方法，通过半监督学习，可以有效减少有标签样本数量，降低模型训练的成本。Semi-supervised (Semi-supervised) learning refers to the method of using labeled samples and unlabeled samples to train the deep learning network model (this application is also referred to as "model"). Through semi-supervised learning, it can effectively reduce the number of labeled samples. Quantity, reducing the cost of model training.

半监督学习，通常先用较小规模的标注数据训练一个Teacher模型，再用这个Teacher模型对较大规模的无标注数据预测出伪标注，最后，根据伪标签数据和标注数据混合作为Student模型的训练数据，对Student模型进行迭代训练。In semi-supervised learning, a Teacher model is usually trained with smaller-scale labeled data, and then the Teacher model is used to predict pseudo-labels for larger-scale unlabeled data. Finally, the mixture of pseudo-labeled data and labeled data is used as the Student model. Training data for iterative training of the Student model.

由于伪标签数据包含大量噪声，若模型过拟合到噪声上，将导致模型的错误预测，训练得到的模型性能较差。Since the pseudo-label data contains a lot of noise, if the model is overfitted to the noise, it will lead to wrong predictions of the model, and the performance of the trained model will be poor.

发明内容Contents of the invention

本申请提供了一种半监督模型训练方法和装置，用于提升模型性能，包括预测精度和泛化性。The present application provides a semi-supervised model training method and device for improving model performance, including prediction accuracy and generalization.

本申请第一方面提供了一种半监督模型训练方法，包括：将第一增强数据输入第一模型获取所述第一增强数据的第一预测结果，以及，将所述第一增强数据输入第二模型获取所述第一增强数据的第二预测结果，所述第一增强数据根据第一数据增强方法处理第一无标注数据得到，所述第二模型与所述第一模型具有相同的结构和不同的初始化参数；将第二增强数据输入所述第一模型获取所述第二增强数据的第三预测结果，以及，将所述第二增强数据输入所述第二模型获取所述第二增强数据的第四预测结果，所述第二增强数据根据第二数据增强方法处理所述第一无标注数据得到，所述第一数据增强方法与所述第二数据增强方法不同；根据所述第一预测结果和所述第四预测结果的一致性确定第一非监督损失；根据所述第二预测结果和所述第三预测结果的一致性确定第二非监督损失；将第一标注数据输入所述第一模型，获取所述第一标注数据的第五预测结果；根据所述第五预测结果和所述第一标注数据的标签确定第一监督损失；基于所述第一非监督损失、所述第二非监督损失和所述第一监督损失更新所述第一模型的参数。The first aspect of the present application provides a semi-supervised model training method, including: inputting the first enhanced data into the first model to obtain the first prediction result of the first enhanced data, and inputting the first enhanced data into the first The second model obtains the second prediction result of the first enhanced data, the first enhanced data is obtained by processing the first unlabeled data according to the first data enhancement method, and the second model has the same structure as the first model and different initialization parameters; input the second enhanced data into the first model to obtain the third prediction result of the second enhanced data, and input the second enhanced data into the second model to obtain the second The fourth prediction result of the enhanced data, the second enhanced data is obtained by processing the first unlabeled data according to the second data enhancement method, the first data enhancement method is different from the second data enhancement method; according to the The consistency of the first prediction result and the fourth prediction result determines the first unsupervised loss; the second unsupervised loss is determined according to the consistency of the second prediction result and the third prediction result; the first label data Input the first model to obtain a fifth prediction result of the first labeled data; determine a first supervised loss based on the fifth predicted result and the label of the first labeled data; based on the first unsupervised loss , the second unsupervised loss and the first supervised loss update parameters of the first model.

本申请提供的半监督模型训练方法，允许网络同时输入标注数据与无标注数据，通过不同的数据增强方法对无标注数据进行混合增强，并采用同结构的第一模型和第二模型进行交叉互训，并根据两个模型预测结果的一致性确定非监督损失，可以实现噪声加权和过滤，能够抑制半监督模型训练过程中伪标签带来的噪声影响，提升模型的泛化性。The semi-supervised model training method provided by this application allows the network to input labeled data and unlabeled data at the same time, mix and enhance unlabeled data through different data enhancement methods, and use the first model and the second model of the same structure for cross interaction Training, and determine the unsupervised loss based on the consistency of the prediction results of the two models, which can realize noise weighting and filtering, and can suppress the noise influence caused by pseudo-labels during the training process of the semi-supervised model, and improve the generalization of the model.

在第一方面一种可能的实现方式中，所述第一模型包括图像的语义分割模型；所述第一数据增强方法为非色彩域增强方法，所述非色彩域增强包括翻转变换、镜像变换、平移变换或尺度变换；所述第二数据增强方法为色彩域增强方法，所述色彩域增强方法包括明暗度变换、对比度变换或图像高斯噪声增强。In a possible implementation of the first aspect, the first model includes an image semantic segmentation model; the first data enhancement method is a non-color gamut enhancement method, and the non-color gamut enhancement includes flip transformation and mirror transformation . Translation transformation or scale transformation; the second data enhancement method is a color gamut enhancement method, and the color gamut enhancement method includes shading transformation, contrast transformation or image Gaussian noise enhancement.

本申请提供的半监督模型训练方法，对于语义分割模型的训练，对无标注数据可以采用两种不同的数据增强方法，包括非色彩域增强方法和色彩域增强方法，基于不同的数据增强方法处理的数据进行模型训练，可以增加模型的泛化性能，提高模型的鲁棒性。The semi-supervised model training method provided by this application, for the training of the semantic segmentation model, can adopt two different data enhancement methods for unlabeled data, including the non-color gamut enhancement method and the color gamut enhancement method, which are processed based on different data enhancement methods Model training can increase the generalization performance of the model and improve the robustness of the model.

在另一种可能的实现方式中，第一模型为图像识别模型，所述第一数据增强方法为非色彩域增强方法，所述非色彩域增强包括翻转变换、镜像变换、平移变换或尺度变换；所述第二数据增强方法为色彩域增强方法，所述色彩域增强方法包括明暗度变换、对比度变换或图像高斯噪声增强。In another possible implementation, the first model is an image recognition model, the first data enhancement method is a non-color gamut enhancement method, and the non-color gamut enhancement includes flip transformation, mirror transformation, translation transformation or scale transformation ; The second data enhancement method is a color gamut enhancement method, and the color gamut enhancement method includes shading transformation, contrast transformation or image Gaussian noise enhancement.

在第一方面一种可能的实现方式中，所述第一模型包括图像的语义分割模型；所述第二增强数据根据第二数据增强方法处理所述第一无标注数据得到，包括：所述第二增强数据根据所述第一无标注数据和所述第一标注数据进行复制粘贴增强获取。In a possible implementation manner of the first aspect, the first model includes a semantic segmentation model of an image; the second enhanced data is obtained by processing the first unlabeled data according to a second data enhancement method, including: the The second enhanced data is obtained through copy-paste enhancement according to the first unlabeled data and the first labeled data.

本申请提供的半监督模型训练方法，第二增强数据根据所述第一无标注数据和所述第一标注数据进行复制粘贴增强，可以利用有标注数据与无标注数据间的联系，减小不同数据集之间的域间差异导致的噪声过大的问题，有效提升模型域间适应性。In the semi-supervised model training method provided by the present application, the second enhanced data is copy-pasted and enhanced according to the first unlabeled data and the first labeled data, and the connection between labeled data and unlabeled data can be used to reduce the difference. The problem of excessive noise caused by inter-domain differences between data sets can effectively improve the inter-domain adaptability of the model.

在第一方面一种可能的实现方式中，所述第一模型为目标检测模型；所述第一数据增强方法包括：翻转变换、平移变换、尺度变换、旋转变换、缩放变换中的一种或者多种；所述第二数据增强方法包括：翻转变换、平移变换、尺度变换、旋转变换、缩放变换中的一种或者多种。In a possible implementation of the first aspect, the first model is a target detection model; the first data enhancement method includes: one of flip transformation, translation transformation, scale transformation, rotation transformation, scaling transformation, or Various; the second data enhancement method includes: one or more of flip transformation, translation transformation, scale transformation, rotation transformation, and scaling transformation.

本申请提供的半监督模型训练方法，对于目标检测模型，考虑其检测结果通常为标注框，通过翻转变换、平移变换、尺度变换、旋转变换或缩放变换等增强方式对数据进行增强，提升模型性能。In the semi-supervised model training method provided by this application, for the target detection model, considering that the detection result is usually a marked box, the data is enhanced by flip transformation, translation transformation, scale transformation, rotation transformation or scaling transformation to improve the performance of the model .

在第一方面一种可能的实现方式中，在将第一标注数据输入第一模型之前，所述方法还包括：对所述第一标注数据进行增强处理，得到增强后的第一标注数据，所述增强后的第一标注数据用于输入所述第一模型，以获取所述第五预测结果。In a possible implementation manner of the first aspect, before inputting the first annotation data into the first model, the method further includes: performing enhancement processing on the first annotation data to obtain enhanced first annotation data, The enhanced first labeled data is used to input the first model to obtain the fifth prediction result.

本申请提供的半监督模型训练方法，还可以对第一标注数据进行增强数据，进一步增强第一模型的泛化性。The semi-supervised model training method provided in the present application can also enhance the first labeled data to further enhance the generalization of the first model.

在第一方面一种可能的实现方式中，将第一标注数据输入所述第二模型，获取所述第一标注数据的第六预测结果；根据第六预测结果和第一标注数据的标签确定第二监督损失；基于所述第一非监督损失、所述第二非监督损失和所述第二监督损失更新所述第二模型的参数。In a possible implementation manner of the first aspect, the first labeled data is input into the second model, and the sixth predicted result of the first labeled data is obtained; and determined according to the sixth predicted result and the label of the first labeled data a second supervised loss; updating parameters of the second model based on the first unsupervised loss, the second unsupervised loss, and the second supervised loss.

本申请提供的半监督模型训练方法，可以同步训练第二模型，基于更新的第二模型获取第二预测结果和第四预测结果，可以提高预测的可靠性。The semi-supervised model training method provided in the present application can train the second model synchronously, obtain the second prediction result and the fourth prediction result based on the updated second model, and can improve the reliability of the prediction.

在第一方面一种可能的实现方式中，所述将第一增强数据输入第一模型获取所述第一增强数据的第一预测结果，以及，将所述第一增强数据输入第二模型获取所述第一增强数据的第二预测结果，包括：将所述第一增强数据输入预设特征提取网络，获取第一特征数据；将所述第一特征数据输入所述第一模型获取所述第一预测结果，以及，将所述第一特征数据输入所述第二模型获取所述第二预测结果。In a possible implementation manner of the first aspect, the inputting the first enhanced data into the first model obtains the first prediction result of the first enhanced data, and inputting the first enhanced data into the second model obtains The second prediction result of the first enhanced data includes: inputting the first enhanced data into a preset feature extraction network to obtain first feature data; inputting the first feature data into the first model to obtain the a first prediction result, and inputting the first feature data into the second model to obtain the second prediction result.

在另一种可能的实现方式中，所述将第二增强数据输入所述第一模型获取所述第二增强数据的第三预测结果，以及，将所述第二增强数据输入所述第二模型获取所述第二增强数据的第四预测结果，包括：将所述第二增强数据输入所述预设特征提取网络，获取第二特征数据；将所述第二特征数据输入所述第一模型获取所述第三预测结果，以及，将所述第二特征数据输入所述第二模型获取所述第四预测结果。In another possible implementation manner, the inputting the second enhanced data into the first model obtains a third prediction result of the second enhanced data, and inputting the second enhanced data into the second The model obtaining the fourth prediction result of the second enhanced data includes: inputting the second enhanced data into the preset feature extraction network to obtain second feature data; inputting the second feature data into the first A model obtains the third prediction result, and the second characteristic data is input into the second model to obtain the fourth prediction result.

本申请提供的半监督模型训练方法，通过特征提取网络提取增强数据的特征数据，然后将特征数据分布输入第一模型和第二模型，可以减少数据处理量，减少数据储存量，降低训练时长，提升训练效率。此外根据同样参数的特征数据输入不同的模型分割头，基于两个不同视角的数据进行模型训练，可以进一步提升模型泛化性。The semi-supervised model training method provided by this application extracts the feature data of the enhanced data through the feature extraction network, and then distributes the feature data into the first model and the second model, which can reduce the amount of data processing, data storage, and training time. Improve training efficiency. In addition, according to the feature data of the same parameters, different model segmentation heads are input, and model training is performed based on data from two different perspectives, which can further improve the generalization of the model.

在第一方面一种可能的实现方式中，所述第一模型为图像的语义分割模型；所述根据所述第一预测结果和所述第四预测结果的一致性确定第一非监督损失包括：确定第一预测结果对应的第一伪标签，若所述第四预测结果中第一像素位置对应的第一预测概率大于所述第一预测结果中所述第一像素位置的第二预测概率，则所述第一伪标签的权值为1，若所述第一预测概率小于或等于所述第二预测概率，则所述第一伪标签的权值为0；根据所述第一预测结果和第一伪标签确定所述第一非监督损失。In a possible implementation manner of the first aspect, the first model is an image semantic segmentation model; the determining the first unsupervised loss according to the consistency between the first prediction result and the fourth prediction result includes : determine the first pseudo label corresponding to the first prediction result, if the first prediction probability corresponding to the first pixel position in the fourth prediction result is greater than the second prediction probability of the first pixel position in the first prediction result , then the weight of the first pseudo-label is 1, and if the first prediction probability is less than or equal to the second prediction probability, the weight of the first pseudo-label is 0; according to the first prediction The results and first pseudo-labels determine the first unsupervised loss.

本申请提供的半监督模型训练方法，通过概率比较确定伪标签，可以提高伪标签的可靠性，提升训练模型的预测精度。The semi-supervised model training method provided in this application can determine the pseudo-label through probability comparison, which can improve the reliability of the pseudo-label and improve the prediction accuracy of the training model.

本申请第二方面提供了一种半监督模型训练装置，包括：处理单元，用于将第一增强数据输入第一模型获取所述第一增强数据的第一预测结果，以及，将所述第一增强数据输入第二模型获取所述第一增强数据的第二预测结果，所述第一增强数据根据第一数据增强方法处理第一无标注数据得到，所述第二模型与所述第一模型具有相同的结构和不同的初始化参数；所述处理单元，还用于将第二增强数据输入所述第一模型获取所述第二增强数据的第三预测结果，以及，将所述第二增强数据输入所述第二模型获取所述第二增强数据的第四预测结果，所述第二增强数据根据第二数据增强方法处理所述第一无标注数据得到，所述第一数据增强方法与所述第二数据增强方法不同；确定单元，用于根据所述第一预测结果和所述第四预测结果的一致性确定第一非监督损失；所述确定单元，还用于根据所述第二预测结果和所述第三预测结果的一致性确定第二非监督损失；所述处理单元，还用于将第一标注数据输入所述第一模型，获取所述第一标注数据的第五预测结果；所述确定单元，还用于根据所述第五预测结果和所述第一标注数据的标签确定第一监督损失；更新单元，用于基于所述第一非监督损失、所述第二非监督损失和所述第一监督损失更新所述第一模型的参数。The second aspect of the present application provides a semi-supervised model training device, including: a processing unit, configured to input the first enhanced data into the first model to obtain the first prediction result of the first enhanced data, and the first enhanced data An enhanced data is input into the second model to obtain the second prediction result of the first enhanced data, the first enhanced data is obtained by processing the first unlabeled data according to the first data enhancement method, and the second model and the first The models have the same structure and different initialization parameters; the processing unit is further configured to input second enhanced data into the first model to obtain a third prediction result of the second enhanced data, and input the second enhanced data The enhanced data is input into the second model to obtain a fourth prediction result of the second enhanced data, the second enhanced data is obtained by processing the first unlabeled data according to the second data enhancement method, and the first data enhancement method Different from the second data enhancement method; the determining unit is used to determine the first unsupervised loss according to the consistency of the first prediction result and the fourth prediction result; the determination unit is also used to determine the first unsupervised loss according to the The consistency between the second prediction result and the third prediction result determines a second unsupervised loss; the processing unit is further configured to input the first labeled data into the first model, and obtain the first labeled data of the first labeled data. Five prediction results; the determination unit is further configured to determine a first supervised loss based on the fifth prediction result and the label of the first labeled data; an update unit is configured to based on the first unsupervised loss, the A second unsupervised loss and the first supervised loss update parameters of the first model.

在第二方面一种可能的实现方式中，所述第一模型包括图像的语义分割模型；所述第一数据增强方法为非色彩域增强方法，所述非色彩域增强包括翻转变换、镜像变换、平移变换或尺度变换；所述第二数据增强方法为色彩域增强方法，所述色彩域增强方法包括明暗度变换、对比度变换或图像高斯噪声增强。In a possible implementation of the second aspect, the first model includes an image semantic segmentation model; the first data enhancement method is a non-color gamut enhancement method, and the non-color gamut enhancement includes flip transformation and mirror transformation . Translation transformation or scale transformation; the second data enhancement method is a color gamut enhancement method, and the color gamut enhancement method includes shading transformation, contrast transformation or image Gaussian noise enhancement.

在第二方面一种可能的实现方式中，所述第一模型包括图像的语义分割模型；所述第二增强数据根据第二数据增强方法处理所述第一无标注数据得到，包括：所述第二增强数据根据所述第一无标注数据和所述第一标注数据进行复制粘贴增强获取。In a possible implementation manner of the second aspect, the first model includes a semantic segmentation model of an image; the second enhanced data is obtained by processing the first unlabeled data according to a second data enhancement method, including: the The second enhanced data is obtained through copy-paste enhancement according to the first unlabeled data and the first labeled data.

在第二方面一种可能的实现方式中，所述第一模型为目标检测模型；所述第一数据增强方法包括：翻转变换、平移变换、尺度变换、旋转变换、缩放变换中的一种或者多种；所述第二数据增强方法包括：翻转变换、平移变换、尺度变换、旋转变换、缩放变换中的一种或者多种。In a possible implementation manner of the second aspect, the first model is a target detection model; the first data enhancement method includes: one of flip transformation, translation transformation, scale transformation, rotation transformation, scaling transformation, or Various; the second data enhancement method includes: one or more of flip transformation, translation transformation, scale transformation, rotation transformation, and scaling transformation.

在第二方面一种可能的实现方式中，所述处理单元还用于：对所述第一标注数据进行增强处理，得到增强后的第一标注数据，所述增强后的第一标注数据用于输入所述第一模型，以获取所述第五预测结果。In a possible implementation manner of the second aspect, the processing unit is further configured to: perform enhancement processing on the first annotation data to obtain enhanced first annotation data, and use the enhanced first annotation data to and inputting the first model to obtain the fifth prediction result.

在第二方面一种可能的实现方式中，所述处理单元，还用于将第一标注数据输入所述第二模型，获取所述第一标注数据的第六预测结果；所述确定单元，还用于根据第六预测结果和第一标注数据的标签确定第二监督损失；所述更新单元，还用于基于所述第一非监督损失、所述第二非监督损失和所述第二监督损失更新所述第二模型的参数。In a possible implementation manner of the second aspect, the processing unit is further configured to input the first label data into the second model, and obtain a sixth prediction result of the first label data; the determining unit, It is further configured to determine a second supervised loss based on the sixth prediction result and the label of the first labeled data; A supervised loss updates the parameters of the second model.

在第二方面一种可能的实现方式中，所述处理单元具体用于：将所述第一增强数据输入预设特征提取网络，获取第一特征数据；将所述第一特征数据输入所述第一模型获取所述第一预测结果，以及，将所述第一特征数据输入所述第二模型获取所述第二预测结果。In a possible implementation manner of the second aspect, the processing unit is specifically configured to: input the first enhanced data into a preset feature extraction network to acquire first feature data; input the first feature data into the The first model obtains the first prediction result, and the first feature data is input into the second model to obtain the second prediction result.

在第二方面一种可能的实现方式中，所述第一模型为图像的语义分割模型；所述处理单元具体用于：确定第一预测结果对应的第一伪标签，若所述第四预测结果中第一像素位置对应的第一预测概率大于所述第一预测结果中所述第一像素位置的第二预测概率，则所述第一伪标签的权值为1，若所述第一预测概率小于或等于所述第二预测概率，则所述第一伪标签的权值为0；根据所述第一预测结果和第一伪标签确定所述第一非监督损失。In a possible implementation manner of the second aspect, the first model is an image semantic segmentation model; the processing unit is specifically configured to: determine the first pseudo-label corresponding to the first prediction result, if the fourth prediction The first predicted probability corresponding to the first pixel position in the result is greater than the second predicted probability of the first pixel position in the first predicted result, the weight of the first pseudo-label is 1, if the first If the predicted probability is less than or equal to the second predicted probability, the weight of the first pseudo-label is 0; and the first unsupervised loss is determined according to the first predicted result and the first pseudo-label.

本申请第三方面提供了一种半监督模型训练装置，包括：存储器，所述存储器中存储有计算机可读指令；与所述存储器相连的处理器，所述计算机可读指令被所述处理器执行时，使得所述半监督模型训练装置实现如上述第一方面以及各种可能的实现方式中任一项所述的方法。The third aspect of the present application provides a semi-supervised model training device, including: a memory, computer readable instructions are stored in the memory; a processor connected to the memory, the computer readable instructions are executed by the processor During execution, the semi-supervised model training device is made to implement the method described in any one of the above first aspect and various possible implementation manners.

本申请第四方面提供了计算机程序产品，包括计算机可读指令，当所述计算机可读指令在计算机上运行时，使得所述计算机执行如上述第一方面以及各种可能的实现方式中任一项所述的方法。The fourth aspect of the present application provides a computer program product, including computer-readable instructions. When the computer-readable instructions are run on a computer, the computer executes any one of the above-mentioned first aspect and various possible implementation manners. method described in the item.

本申请第五方面提供了一种计算机可读存储介质，所述计算机可读存储介质中存储有指令，当所述指令在计算机上运行时，使得计算机执行如上述第一方面以及各种可能的实现方式中任一项所述的方法。The fifth aspect of the present application provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are run on a computer, the computer executes the above-mentioned first aspect and various possible Implement the method described in any one of the manners.

本申请第六方面提供了一种芯片，包括处理器。处理器用于读取并执行存储器中存储的计算机程序，以执行上述任一方面任意可能的实现方式中的方法。可选地，该芯片该包括存储器，该存储器与该处理器通过电路或电线与存储器连接。进一步可选地，该芯片还包括通信接口，处理器与该通信接口连接。通信接口用于接收需要处理的数据和/或信息，处理器从该通信接口获取该数据和/或信息，并对该数据和/或信息进行处理，并通过该通信接口输出处理结果。该通信接口可以是输入输出接口。The sixth aspect of the present application provides a chip, including a processor. The processor is used to read and execute the computer program stored in the memory, so as to execute the method in any possible implementation manner of any aspect above. Optionally, the chip includes a memory, and the memory and the processor are connected to the memory through a circuit or wires. Further optionally, the chip further includes a communication interface, and the processor is connected to the communication interface. The communication interface is used to receive data and/or information to be processed, and the processor obtains the data and/or information from the communication interface, processes the data and/or information, and outputs the processing result through the communication interface. The communication interface may be an input-output interface.

其中，第二方面、第三方面、第四方面、第五方面或第六方面以及其中任一种实现方式所带来的技术效果可参见第一方面中相应实现方式所带来的技术效果，此处不再赘述。Wherein, the technical effect brought by the second aspect, the third aspect, the fourth aspect, the fifth aspect or the sixth aspect and any one of the implementation methods can refer to the technical effect brought by the corresponding implementation method in the first aspect, I won't repeat them here.

本申请提供的半监督模型训练方法，通过不同的数据增强方法对无标注数据进行混合增强，并采用同结构的第一模型和第二模型进行交叉互训，并根据两个模型预测结果的一致性确定非监督损失，可以实现噪声加权和过滤，能够抑制半监督模型训练过程中伪标签带来的噪声影响，提升模型性能，包括预测精度和泛化性。The semi-supervised model training method provided by this application uses different data enhancement methods to perform mixed enhancement on unlabeled data, and uses the first model and the second model of the same structure for cross-training, and predicts the results according to the consistency of the two models Deterministic unsupervised loss can achieve noise weighting and filtering, which can suppress the noise impact caused by pseudo-labels during semi-supervised model training and improve model performance, including prediction accuracy and generalization.

附图说明Description of drawings

图1为半监督模型训练方法的一个架构示意图；Fig. 1 is a schematic diagram of an architecture of a semi-supervised model training method;

图2为本申请实施例提供的一种系统架构示意图；FIG. 2 is a schematic diagram of a system architecture provided by an embodiment of the present application;

图3为本申请实施例中半监督模型训练方法的一个实施例示意图；Fig. 3 is a schematic diagram of an embodiment of a semi-supervised model training method in the embodiment of the present application;

图4为本申请实施例中半监督模型训练装置的架构示意图；4 is a schematic diagram of the architecture of a semi-supervised model training device in the embodiment of the present application;

图5为本申请实施例中获取半监督损失的一个示意图；Fig. 5 is a schematic diagram of obtaining semi-supervised loss in the embodiment of the present application;

图6为本申请实施例中半监督模型训练方法的另一个实施例示意图；6 is a schematic diagram of another embodiment of the semi-supervised model training method in the embodiment of the present application;

图7为本申请实施例中半监督模型训练装置的一个实施例示意图；Fig. 7 is a schematic diagram of an embodiment of a semi-supervised model training device in the embodiment of the present application;

图8为本申请实施例中半监督模型训练装置的另一个实施例示意图。Fig. 8 is a schematic diagram of another embodiment of the semi-supervised model training device in the embodiment of the present application.

具体实施方式detailed description

本申请提供了一种半监督模型训练方法和装置，用于提升模型泛化性。The present application provides a semi-supervised model training method and device for improving model generalization.

下面结合附图，对本申请的实施例进行描述，显然，所描述的实施例仅仅是本申请一部分的实施例，而不是全部的实施例。本领域普通技术人员可知，随着技术的发展和新场景的出现，本申请实施例提供的技术方案对于类似的技术问题，同样适用。Embodiments of the present application are described below in conjunction with the accompanying drawings. Apparently, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Those of ordinary skill in the art know that, with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.

本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或模块的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或模块，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或模块。在本申请中出现的对步骤进行的命名或者编号，并不意味着必须按照命名或者编号所指示的时间/逻辑先后顺序执行方法流程中的步骤，已经命名或者编号的流程步骤可以根据要实现的技术目的变更执行次序，只要能达到相同或者相类似的技术效果即可。The terms "first", "second" and the like in the specification and claims of the present application and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or modules is not necessarily limited to the expressly listed Instead, other steps or modules not explicitly listed or inherent to the process, method, product or apparatus may be included. The naming or numbering of the steps in this application does not mean that the steps in the method flow must be executed in the time/logic sequence indicated by the naming or numbering. The execution order of the technical purpose is changed, as long as the same or similar technical effect can be achieved.

语义分割是计算机视觉和数字图像处理的一个热门方向，作为许多场景理解系统的基础组成部分，它将图像或视频切分成多个部分或物体，广泛应用于机器人导航、智能视频监控、平安城市、医疗图像分析、辅助驾驶、增强现实等诸多领域，通过计算机视觉减少对人力资本的消耗，具有重要的现实意义。因此，语义分割也就成为了近年来理论和应用的研究热点，由于深度学习的广泛运用，语义分割算法得到了较为快速的发展。Semantic segmentation is a popular direction in computer vision and digital image processing. As a basic component of many scene understanding systems, it divides images or videos into multiple parts or objects, and is widely used in robot navigation, intelligent video surveillance, safe cities, In many fields such as medical image analysis, assisted driving, and augmented reality, it is of great practical significance to reduce the consumption of human capital through computer vision. Therefore, semantic segmentation has become a research hotspot in theory and application in recent years. Due to the wide application of deep learning, semantic segmentation algorithms have developed relatively rapidly.

下面结合附图，对本申请的实施例进行描述。Embodiments of the present application are described below in conjunction with the accompanying drawings.

深度学习网络的开发需要依赖大规模的数据样本以及精细的标注标签，通常人工标注需要耗费大量的人力财力，这极大的限制了语义分割算法的应用。在实际场景中，有限的标注样本会导致深度学习网络训练过拟合且无法对不同类别的样本进行有效区分，最终导致深度学习网络的错误预测。昂贵的标注成本，大量的人力投入以及较长的算法开发周期，对于商业化应用是不利的。因此，半监督学习旨在通过少量有标注数据结合大量无标注数据对深度学习网路进行训练，从而提升深度学习网络的泛化性能，有效的挖掘有标注数据与无标注数据间的关联，增强深度学习网络在实际应用中的稳定性和鲁棒性。请参阅图1，为半监督模型训练方法的架构示意图。根据标注数据获取监督损失，以及无标注数据获取的无监督损失训练模型，可以相对有监督训练方法降低对标注数据的需求量，降低训练成本。The development of deep learning networks needs to rely on large-scale data samples and finely labeled labels. Usually, manual labeling requires a lot of human and financial resources, which greatly limits the application of semantic segmentation algorithms. In actual scenarios, limited labeled samples will lead to overfitting of deep learning network training and the inability to effectively distinguish samples of different categories, which will eventually lead to wrong predictions of deep learning network. Expensive labeling costs, a large amount of manpower input and a long algorithm development cycle are unfavorable for commercial applications. Therefore, semi-supervised learning aims to train the deep learning network through a small amount of labeled data combined with a large amount of unlabeled data, thereby improving the generalization performance of the deep learning network, effectively mining the relationship between labeled data and unlabeled data, and enhancing Stability and Robustness of Deep Learning Networks in Practical Applications. Please refer to Figure 1, which is a schematic diagram of the architecture of the semi-supervised model training method. Obtaining supervised loss based on labeled data and unsupervised loss training model obtained from unlabeled data can reduce the demand for labeled data and reduce training costs compared with supervised training methods.

语义分割是计算机视觉和数字图像处理的一个热门方向，作为许多场景理解系统的基础组成部分，它将图像或视频切分成多个部分或物体，广泛应用于机器人导航、智能视频监控、平安城市、医疗图像分析、辅助驾驶、增强现实等诸多领域，通过计算机视觉减少对人力资本的消耗，具有重要的现实意义。因此，语义分割也就成为了近年来理论和应用的研究热点，由于深度学习的广泛运用，语义分割算法得到了较为快速的发展。本申请提到的语义分割网络指的就是基于深度学习的语义分割网络。Semantic segmentation is a popular direction in computer vision and digital image processing. As a basic component of many scene understanding systems, it divides images or videos into multiple parts or objects, and is widely used in robot navigation, intelligent video surveillance, safe cities, In many fields such as medical image analysis, assisted driving, and augmented reality, it is of great practical significance to reduce the consumption of human capital through computer vision. Therefore, semantic segmentation has become a research hotspot in theory and application in recent years. Due to the wide application of deep learning, semantic segmentation algorithms have developed relatively rapidly. The semantic segmentation network mentioned in this application refers to the semantic segmentation network based on deep learning.

传统的半监督训练方法中，由于伪标签数据包含大量噪声，会导致在迭代后期模型会被引入大量噪声数据进行训练，若模型过拟合到噪声上，将导致模型的泛化性较差。In the traditional semi-supervised training method, since the pseudo-label data contains a lot of noise, the model will be introduced into a large amount of noisy data for training in the later stages of the iteration. If the model is overfitted to the noise, the generalization of the model will be poor.

采用两个语义分割网络动态联合训练的方法，通过比较两个语义分割网络的输出结果动态更新损失函数权重，利用两个语义分割网络间的预测不一致性，采取多模型投票的机制对伪标签进行修正。该方法可以一定程度上提升模型泛化性，但仍不足。且多轮迭代的机制在数据量比较大的情况下，会导致训练耗时较久，耗费存储空间较大。Using the method of dynamic joint training of two semantic segmentation networks, the weight of the loss function is dynamically updated by comparing the output results of the two semantic segmentation networks, and the pseudo-labels are evaluated using the multi-model voting mechanism by using the prediction inconsistency between the two semantic segmentation networks. fix. This method can improve the generalization of the model to a certain extent, but it is still insufficient. Moreover, the multi-round iteration mechanism will take a long time to train and consume a large amount of storage space when the amount of data is relatively large.

本申请实施例提供了一种半监督模型训练方法和装置。该方法可以应用在无人驾驶、平安城市、手机终端等需要语义分割的领域，手机拍照或安防场景等。Embodiments of the present application provide a semi-supervised model training method and device. This method can be applied in areas that require semantic segmentation such as unmanned driving, safe cities, and mobile terminals, as well as mobile phone photography or security scenarios.

示例性的，在自动驾驶语义分割中，语义分割有着非常重要的作用，在构图定位、道路特征分析等方面都有重要的意义，但是由于多类语义分割的标注成本较高，对于自动驾驶而言需要较高语义分割结果才能够为后端提供可靠的决策。利用本申请提供的半监督模型训练方法能够有效的降低标注数据的需求量，同时针对不同域之间的分布差异而导致分割性能下降的问题有较为明显的缓解。For example, in semantic segmentation of autonomous driving, semantic segmentation plays a very important role in composition positioning, road feature analysis, etc. Language needs higher semantic segmentation results to provide reliable decision-making for the backend. Using the semi-supervised model training method provided by this application can effectively reduce the demand for labeled data, and at the same time, the problem of degradation in segmentation performance caused by distribution differences between different domains can be significantly alleviated.

当前手机拍照背景虚化、抠人像等多种特性都离不开语义分割，但是对于用户而言，拍摄的场景、设备、环境等有很大的差异，所以针对性的更新和优化分割系统性能是非常有必要且迫切的，利用本申请提供的半监督模型训练方法能够很好的在不增加标注数据的前提下提升算法的精度，并提升模型的泛化性。Semantic segmentation is inseparable from various features such as blurring the background and picking portraits of mobile phones. However, for users, the shooting scenes, equipment, and environments are very different, so targeted update and optimization of the segmentation system performance It is very necessary and urgent. Using the semi-supervised model training method provided by this application can improve the accuracy of the algorithm and improve the generalization of the model without increasing the labeled data.

下面介绍本申请实施例提供的系统架构。The system architecture provided by the embodiment of the present application is introduced below.

参见图2，本申请实施例提供了一种系统架构200。如系统架构200所示，数据采集设备260可以用于采集训练数据。在数据采集设备260采集到训练数据之后，将这些训练数据存入数据库230，训练设备220基于数据库230中维护的训练数据训练得到目标模型/规则201。Referring to FIG. 2 , an embodiment of the present application provides a system architecture 200 . As shown in system architecture 200, data collection device 260 may be used to collect training data. After the data acquisition device 260 collects the training data, the training data is stored in the database 230 , and the training device 220 obtains the target model/rule 201 based on training data maintained in the database 230 .

下面对训练设备220基于训练数据得到目标模型/规则201进行描述。示例性地，训练设备220对多帧样本图像进行处输出对应的预测标签，并计算预测标签和样本的原始标签之间的损失，基于该损失对分类网络进行更新，直到预测标签接近样本的原始标签或者预测标签和原始标签之间的差异小于阈值，从而完成目标模型/规则201的训练。标签可以为本申请实施例涉及的标注框。具体描述详见后文中的训练方法。The following describes how the training device 220 obtains the target model/rule 201 based on the training data. Exemplarily, the training device 220 outputs the corresponding predicted label for multiple frames of sample images, and calculates the loss between the predicted label and the original label of the sample, and updates the classification network based on the loss until the predicted label is close to the original label of the sample. The difference between the label or the predicted label and the original label is less than a threshold, thereby completing the training of the target model/rule 201 . The label may be a label box involved in this embodiment of the present application. For a detailed description, see the training method in the following text.

本申请实施例中的目标模型/规则201具体可以为神经网络。需要说明的是，在实际的应用中，数据库230中维护的训练数据不一定都来自于数据采集设备260的采集，也有可能是从其他设备接收得到的。可以对采集的数据进行标注，以及增强处理，获取该训练数据。另外需要说明的是，训练设备220也不一定完全基于数据库230维护的训练数据进行目标模型/规则201的训练，也有可能从云端或其他地方获取训练数据进行模型训练，上述描述不应该作为对本申请实施例的限定。The target model/rule 201 in the embodiment of the present application may specifically be a neural network. It should be noted that, in practical applications, the training data maintained in the database 230 may not all be collected by the data collection device 260, but may also be received from other devices. The collected data can be marked and enhanced to obtain the training data. In addition, it should be noted that the training device 220 does not necessarily perform the training of the target model/rules 201 based entirely on the training data maintained by the database 230, and it is also possible to obtain training data from the cloud or other places for model training. Limitations of the Examples.

根据训练设备220训练得到的目标模型/规则201可以应用于不同的系统或设备中，如应用于图2所示的执行设备210，所述执行设备210可以是终端，如手机终端，平板电脑，笔记本电脑，增强现实(augmented reality，AR)/虚拟现实(virtual reality，VR)，车载终端，电视等，还可以是服务器或者云端等。在图2中，执行设备210配置有收发器212，该收发器可以包括输入/输出(input/output，I/O)接口或者其他无线或者有线的通信接口等，用于与外部设备进行数据交互，以I/O接口为例，用户可以通过客户设备240向I/O接口输入数据。The target model/rule 201 trained according to the training device 220 can be applied to different systems or devices, such as the execution device 210 shown in FIG. Laptop, augmented reality (augmented reality, AR)/virtual reality (virtual reality, VR), vehicle terminal, TV, etc., can also be a server or cloud, etc. In FIG. 2, the execution device 210 is configured with a transceiver 212, which may include an input/output (I/O) interface or other wireless or wired communication interfaces, etc., for data interaction with external devices. , taking the I/O interface as an example, the user can input data to the I/O interface through the client device 240 .

在执行设备210对输入数据进行预处理，或者在执行设备210的计算模块111执行计算等相关的处理过程中，执行设备210可以调用数据存储系统250中的数据、代码等以用于相应的处理，也可以将相应处理得到的数据、指令等存入数据存储系统250中。When the execution device 210 preprocesses the input data, or in the execution device 210 computing module 111 executes calculations and other related processing, the execution device 210 can call the data, codes, etc. in the data storage system 250 for corresponding processing , and the correspondingly processed data and instructions may also be stored in the data storage system 250 .

最后，I/O接口212将处理结果返回给客户设备240，从而提供给用户。Finally, the I/O interface 212 returns the processing result to the client device 240, thereby providing it to the user.

值得说明的是，训练设备220可以针对不同的目标或称不同的任务，基于不同的训练数据生成相应的目标模型/规则201，该相应的目标模型/规则201即可以用于实现上述目标或完成上述任务，从而为用户提供所需的结果。It is worth noting that the training device 220 can generate corresponding target models/rules 201 based on different training data for different goals or different tasks, and the corresponding target models/rules 201 can be used to achieve the above goals or complete above tasks, thereby providing the desired result to the user.

在图2中所示情况下，用户可以手动给定输入数据，该手动给定可以通过收发器212提供的界面进行操作。另一种情况下，客户设备240可以自动地向收发器212发送输入数据，如果要求客户设备240自动发送输入数据需要获得用户的授权，则用户可以在客户设备240中设置相应权限。用户可以在客户设备240查看执行设备210输出的结果，具体的呈现形式可以是显示、声音、动作等具体方式。客户设备240也可以作为数据采集端，采集如图所示输入收发器212的输入数据及输出收发器212的输出结果作为新的样本数据，并存入数据库230。当然，也可以不经过客户设备240进行采集，而是由收发器212直接将如图所示输入收发器212的输入数据及输出收发器212的输出结果，作为新的样本数据存入数据库230。In the case shown in FIG. 2 , the user can manually specify the input data, and the manual specification can be operated through the interface provided by the transceiver 212 . In another case, the client device 240 can automatically send the input data to the transceiver 212 . If the client device 240 is required to automatically send the input data to obtain the user's authorization, the user can set the corresponding authority in the client device 240 . The user can view the results output by the execution device 210 on the client device 240, and the specific presentation form may be specific ways such as display, sound, and action. The client device 240 can also be used as a data collection terminal, collecting the input data input to the transceiver 212 as shown in the figure and the output results of the output transceiver 212 as new sample data, and storing them in the database 230 . Of course, the client device 240 may not be used for collection, but the transceiver 212 directly stores the input data input into the transceiver 212 and the output result of the output transceiver 212 as new sample data into the database 230 as shown in the figure.

值得注意的是，图2仅是本申请实施例提供的一种系统架构的示意图，图中所示设备、器件、模块等之间的位置关系不构成任何限制，例如，在图2中，数据存储系统250相对执行设备210是外部存储器，在其它情况下，也可以将数据存储系统250置于执行设备210中。It should be noted that FIG. 2 is only a schematic diagram of a system architecture provided by the embodiment of the present application, and the positional relationship between devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in FIG. 2, the data The storage system 250 is an external memory relative to the execution device 210 , and in other cases, the data storage system 250 may also be placed in the execution device 210 .

如图2所示，根据训练设备220训练得到目标模型/规则201，该目标模型/规则201在本申请实施例中可以是本申请中涉及的第一模型或第二模型，例如用于图像处理的语义分割模型或者目标识别模型。As shown in FIG. 2 , the target model/rule 201 is obtained by training according to the training device 220. In the embodiment of the present application, the target model/rule 201 may be the first model or the second model involved in the present application, for example, for image processing Semantic segmentation model or target recognition model.

本申请实施例提供了一种半监督模型训练方法和装置，用于提升模型泛化性，还可以提升模型预测精度。请参见图3，本申请实施例提出了一种半监督模型训练方法。The embodiment of the present application provides a semi-supervised model training method and device, which are used to improve the generalization of the model, and can also improve the prediction accuracy of the model. Referring to FIG. 3 , the embodiment of the present application proposes a semi-supervised model training method.

该方法包括：The method includes:

301、将第一增强数据输入第一模型获取第一增强数据的第一预测结果，以及，将第一增强数据输入第二模型获取第一增强数据的第二预测结果。301. Input first enhanced data into a first model to obtain a first prediction result of the first enhanced data, and input the first enhanced data into a second model to obtain a second prediction result of the first enhanced data.

302、将第二增强数据输入第一模型获取第二增强数据的第三预测结果，以及，将第二增强数据输入第二模型获取第二增强数据的第四预测结果。302. Input the second enhanced data into the first model to obtain a third prediction result of the second enhanced data, and input the second enhanced data into the second model to obtain a fourth prediction result of the second enhanced data.

本申请的半监督模型训练方法中，涉及的训练数据包括标注数据和非标注数据，其中，无标注数据在输入模型进行推理前，需要经过两种不同增强处理。对于第一无标注数据，分别通过第一数据增强方法处理，得到第一增强数据；通过第二数据增强方法处理，得到第二增强数据，需要说明的是，第一数据增强方法与第二数据增强方法的不同。In the semi-supervised model training method of the present application, the training data involved include labeled data and unlabeled data, wherein the unlabeled data needs to undergo two different enhancement processes before being input into the model for inference. For the first unlabeled data, the first enhanced data is obtained through the first data enhancement method, and the second enhanced data is obtained through the second data enhancement method. It should be noted that the first data enhancement method and the second data different enhancement methods.

第二模型与第一模型具有相同的结构和不同的初始化参数，例如第一模型和第二模型均为语义分割模型，或者，第一模型和第二模型均为目标识别模型。The second model and the first model have the same structure and different initialization parameters, for example, both the first model and the second model are semantic segmentation models, or both the first model and the second model are object recognition models.

第一增强方法和第二增强方法的具体类型与第一模型和第二模型的预测功能相关：具体类型有多种：The specific types of the first augmentation method and the second augmentation method are related to the prediction functions of the first model and the second model: there are many specific types:

在一种可能的实现方式中，第一模型包括图像的语义分割模型。第一数据增强方法为非色彩域增强方法，非色彩域增强包括翻转变换、镜像变换、平移变换或尺度变换；第二数据增强方法为色彩域增强方法，色彩域增强方法包括明暗度变换、对比度变换或图像高斯噪声增强。In a possible implementation manner, the first model includes an image semantic segmentation model. The first data enhancement method is a non-color gamut enhancement method, and the non-color gamut enhancement includes flip transformation, mirror transformation, translation transformation or scale transformation; the second data enhancement method is a color gamut enhancement method, and the color gamut enhancement method includes shading transformation, contrast Transform or image Gaussian noise enhancement.

在另一种可能的实现方式中，第一模型为目标检测模型。第一数据增强方法包括：翻转变换、平移变换、尺度变换、旋转变换、缩放变换中的一种或者多种；第二数据增强方法包括：翻转变换、平移变换、尺度变换、旋转变换、缩放变换中的一种或者多种。In another possible implementation manner, the first model is a target detection model. The first data enhancement method includes: one or more of flip transformation, translation transformation, scale transformation, rotation transformation, and scaling transformation; the second data enhancement method includes: flip transformation, translation transformation, scale transformation, rotation transformation, and scaling transformation one or more of them.

也就是说，第一无标注数据，经第一数据增强方法和第二数据增强方法从不同类型方向进行增强。基于处理后的第一增强数据和第二增强数据进行模型训练可以提升模型泛化性。That is to say, the first unlabeled data is enhanced from different types of directions through the first data enhancement method and the second data enhancement method. Performing model training based on the processed first enhanced data and second enhanced data can improve model generalization.

在另一种可能的实现方式中，为增加标注数据训练与无标注数据之间的关联性，减少数据域之间的差异，第二增强数据根据第二数据增强方法处理第一无标注数据得到，具体的第二增强数据根据第一无标注数据和第一标注数据进行复制粘贴增强获取。在一种实现方式中，复制粘贴增强可以理解为在标注图像中随机剪裁一部分粘贴贴到无标注数据中，利用该增强方法解决数据域不统一的问题，可以进一步提升模型预测准确度。In another possible implementation, in order to increase the correlation between labeled data training and unlabeled data and reduce the difference between data domains, the second enhanced data is obtained by processing the first unlabeled data according to the second data enhancement method , the specific second enhanced data is obtained through copy-paste enhancement according to the first unlabeled data and the first labeled data. In one implementation, copy-paste enhancement can be understood as randomly cutting a part of the labeled image and pasting it into the unlabeled data. Using this enhancement method to solve the problem of inconsistency in the data domain can further improve the prediction accuracy of the model.

将第一增强数据分别输入第一模型和第二模型，分别获取第一预测结果，以及第二预测结果；将第二增强数据输入输入第一模型和第二模型，分别获取第三预测结果，以及第四预测结果。Input the first enhanced data into the first model and the second model respectively, obtain the first prediction result and the second prediction result respectively; input the second enhanced data into the first model and the second model, obtain the third prediction result respectively, and the fourth predicted result.

在第一模型和第二模型对数据进行推理的过程中，都需要进行特征提取，考虑到当训练数据量较大时，分别通过第一模型和第二模型完整进行推理将耗费较多的计算和存储资源，因此，在一种可能的实现方式中将第一增强数据输入预设特征提取网络，获取第一特征数据；将第一特征数据输入第一模型获取第一预测结果，以及，将第一特征数据输入第二模型获取第二预测结果。由此，可以减少计算和存储资源消耗。In the process of data inference by the first model and the second model, feature extraction is required. Considering that when the amount of training data is large, it will consume more calculations to complete the inference through the first model and the second model respectively. and storage resources, therefore, in a possible implementation, the first enhanced data is input into the preset feature extraction network to obtain the first feature data; the first feature data is input into the first model to obtain the first prediction result, and the The first feature data is input into the second model to obtain a second prediction result. Thus, computing and storage resource consumption can be reduced.

303、根据第一预测结果和第四预测结果的一致性确定第一非监督损失。303. Determine a first unsupervised loss according to the consistency between the first prediction result and the fourth prediction result.

304、根据第二预测结果和第三预测结果的一致性确定第二非监督损失。304. Determine a second unsupervised loss according to the consistency between the second prediction result and the third prediction result.

语义分割模型的预测结果为图像中每个像素的预测值及置信度，如果第一预测结果的第j个像素属于第i类的置信度高于第四预测结果，那么第一非监督损失基于第一预测结果的损失值确定，否则为0。The prediction result of the semantic segmentation model is the prediction value and confidence of each pixel in the image. If the confidence of the j-th pixel belonging to the i-th class of the first prediction result is higher than the fourth prediction result, then the first unsupervised loss is based on The loss value of the first prediction result is determined, otherwise it is 0.

305、将第一标注数据输入第一模型，获取第一标注数据的第五预测结果。305. Input the first labeled data into the first model, and acquire a fifth prediction result of the first labeled data.

第一标注数据例如可以是RGB通道的图像，以及图像对应的分割标注(label)，将第一标注数据输入第一模型，获取第一标注数据的第五预测结果。The first labeled data may be, for example, an image of RGB channels and a segmentation label (label) corresponding to the image. The first labeled data is input into the first model to obtain a fifth prediction result of the first labeled data.

在将第一标注数据输入第一模型之前，方法还包括：对第一标注数据进行增强处理，得到增强后的第一标注数据，增强后的第一标注数据用于输入第一模型，以获取第五预测结果。本申请对第一标注数据进行增强数据，可以进一步增强第一模型的泛化性。Before inputting the first annotation data into the first model, the method further includes: performing enhancement processing on the first annotation data to obtain enhanced first annotation data, and the enhanced first annotation data is used for input into the first model to obtain Fifth prediction result. The present application enhances the first labeled data, which can further enhance the generalization of the first model.

对于第一标注数据进行增强处理的方法具体不做限定。The method for performing enhancement processing on the first labeled data is not specifically limited.

306、根据第五预测结果和第一标注数据的标签确定第一监督损失。306. Determine a first supervision loss according to the fifth prediction result and the label of the first labeled data.

根据第一标注数据中的标注和第五预测结果之间的差异可以确定第一监督损失。A first supervised loss may be determined based on the difference between the annotation in the first annotated data and the fifth predicted result.

307、基于第一非监督损失、第二非监督损失和第一监督损失更新第一模型的参数。307. Update parameters of the first model based on the first unsupervised loss, the second unsupervised loss, and the first supervised loss.

根据上述第一非监督损失、第二非监督损失和第一监督损失更新第一模型的参数，对第一模型进行训练。The parameters of the first model are updated according to the first unsupervised loss, the second unsupervised loss and the first supervised loss, and the first model is trained.

可选地，将第一标注数据输入第二模型，获取第一标注数据的第六预测结果；根据第六预测结果和第一标注数据的标签确定第二监督损失；基于第一非监督损失、第二非监督损失和第二监督损失更新第二模型的参数。由此，可以同步训练第二模型，基于更新的第二模型获取第二预测结果和第四预测结果，可以提高预测的可靠性。Optionally, input the first labeled data into the second model to obtain the sixth prediction result of the first labeled data; determine the second supervised loss according to the sixth predicted result and the label of the first labeled data; based on the first unsupervised loss, The second unsupervised loss and the second supervised loss update the parameters of the second model. Thus, the second model can be trained synchronously, and the second prediction result and the fourth prediction result can be obtained based on the updated second model, which can improve the reliability of the prediction.

基于本申请提供的半监督模型训练方法有效的利用无标注数据，极大的节约标注成本。Based on the semi-supervised model training method provided by this application, the unlabeled data is effectively used, which greatly saves the labeling cost.

请参阅图4，下面对本申请实施例提供的半监督模型训练装置的架构进行介绍。Referring to FIG. 4 , the architecture of the semi-supervised model training device provided by the embodiment of the present application will be introduced below.

该架构包括：弱增强模块101，强增强模块102，特征提取模块103，语义分割头模块104，语义分割头模块105、不确定性引导的加权模块(uncertainty guided re-weightmodule，UGRM)模块106组成。The architecture includes: a weak enhancement module 101, a strong enhancement module 102, a feature extraction module 103, a semantic segmentation header module 104, a semantic segmentation header module 105, and an uncertainty guided re-weight module (UGRM) module 106. .

需要说明的是，语义分割头模块105和语义分割头模块104具有相同结构，以及不同的初始化参数。It should be noted that the semantic segmentation header module 105 and the semantic segmentation header module 104 have the same structure and different initialization parameters.

弱增强模块101：对于标注数据，通过对数据和标注进行非色彩域的增强，例如随机水平翻转，即图像进行水平方向的翻转；随机镜像，即图像关于竖直方向进行镜像，等。Weak enhancement module 101: For the labeled data, non-color gamut enhancements are performed on the data and labels, such as random horizontal flipping, that is, the image is flipped in the horizontal direction; random mirroring, that is, the image is mirrored in the vertical direction, and so on.

增强后的数据＝random(图像水平翻转，图像水平方向镜像)。Enhanced data=random (horizontally flipped image, mirror image horizontally).

该弱增强模块101可以用于对标注数据进行增强处理。或者对无标注数据进行增强处理，相当于前述实施例中通过第一数据增强方法处理，可以得到第一增强数据。The weak enhancement module 101 can be used to enhance the labeled data. Alternatively, performing enhancement processing on the unlabeled data is equivalent to processing through the first data enhancement method in the foregoing embodiments to obtain the first enhanced data.

强增强模块102：对于无标注数据进行增强处理，通过图像随机明暗度变化、图像随机对比度变化、图像高斯噪声等色彩域的增强。可选地，强增强模块102还可以是对交叉数据集(例如标准数据和非标注数据)中的图像(cross-set)的复制粘贴(copy paste)等；其中，图像高斯噪声增强是对于图像中的像素随机加入符合高斯分布的噪声。Strong enhancement module 102: Perform enhancement processing on the unlabeled data, by enhancing the color domain such as image random brightness changes, image random contrast changes, and image Gaussian noise. Optionally, the strong enhancement module 102 can also be copy paste (copy paste) etc. to the image (cross-set) in the cross data set (such as standard data and non-labeled data); wherein, the image Gaussian noise enhancement is for the image The pixels in are randomly added with noise conforming to the Gaussian distribution.

需要说明的是，弱增强模块101用于根据第一数据增强方法处理第一无标注数据获取前述实施例中第一增强数据；类似的，强增强模块102用于根据第二数据增强方法处理第一无标注数据获取前述实施例中第二增强数据。It should be noted that the weak enhancement module 101 is used to process the first unlabeled data according to the first data enhancement method to obtain the first enhanced data in the foregoing embodiments; similarly, the strong enhancement module 102 is used to process the first data enhancement method according to the second data enhancement method. - Unlabeled data Acquire the second enhanced data in the foregoing embodiment.

特征提取模块103：语义分割特征提取器，主要是利用卷积、池化等操作对图片进行特征提取。需要说明的是，语义分割网络一般包括两个部分：编码(encoder)和解码(decoder)，encoder和decoder均由一系列卷积、池化、上采样(upsample)等操作组成。特征提取模块103相当于语义分割中encoder部分；encoder的作用为将一个C×H×W尺寸的图像不断地卷积到一个C’×H’×W’，其中C<C’,H>H’,W>W’，即将图像进行H和W方向上的维度压缩，并扩大C维度。Feature extraction module 103: a semantic segmentation feature extractor, which mainly uses operations such as convolution and pooling to extract features from pictures. It should be noted that the semantic segmentation network generally includes two parts: encoding (encoder) and decoding (decoder), both of which are composed of a series of operations such as convolution, pooling, and upsampling (upsample). The feature extraction module 103 is equivalent to the encoder part in semantic segmentation; the function of the encoder is to continuously convolve a C×H×W size image to a C'×H'×W', where C<C', H>H ',W>W', that is, compress the image in the H and W directions, and expand the C dimension.

语义分割头模块104(Head1)：相当于语义分割decoder部分，用于将特征提取模块103的特征进行上采样恢复到指定尺寸的模块；decoder的作用即为将encoder下采样后的图像回复到一定尺寸(上采样)。Semantic segmentation header module 104 (Head1): equivalent to the semantic segmentation decoder part, which is used to upsample the features of the feature extraction module 103 and restore them to a specified size; Dimensions (upsampling).

语义分割头模块105(Head2)：与语义分割头104类似，需要说明的是Head1和Head2均为相同结构不同初始化的分割头(segmentation head，可以理解为decoder)。Semantic segmentation head module 105 (Head2): similar to the semantic segmentation head 104, it should be noted that both Head1 and Head2 are segmentation heads with the same structure and different initialization (segmentation head, which can be understood as a decoder).

Head1的无标注数据弱增强预测(Pred1,weak’)和Head2的无标注数据弱增强预测(Pred2,weak’)为不同head在标签数据(labeled date)上的预测。Head1's unlabeled data weak enhancement prediction (Pred1, weak') and Head2's unlabeled data weak enhancement prediction (Pred2, weak') are the predictions of different heads on the labeled data (labeled date).

需要说明的是，本实施例中语义分割头模块104相当于前述实施例中的第一模型，类似的，语义分割头模块105相当于前述实施例中的第二模型。It should be noted that the semantic segmentation header module 104 in this embodiment is equivalent to the first model in the foregoing embodiments, and similarly, the semantic segmentation header module 105 is equivalent to the second model in the foregoing embodiments.

UGRM模块106：用于计算同一张无标注数据在经过模块104和模块105后响应不同的不确定性。如上所述，每个head的输出有三个结果，分别为：UGRM module 106: used to calculate the different uncertainties of the same unlabeled data after passing through module 104 and module 105. As mentioned above, the output of each head has three results, namely:

a.标注数据RGB图像经过模块101弱增强输入到网络中的预测结果；a. The prediction result of the label data RGB image being input into the network through module 101 weak enhancement;

b.未标注数据RGB图像经过模块101弱增强输入到网络中的预测结果；b. The unlabeled data RGB image is weakly enhanced by module 101 and input to the prediction result of the network;

c.未标注数据RGB图像经过模块102强增强输入到网络中的预测结果。c. The prediction result input to the network is strongly enhanced by the unlabeled data RGB image through the module 102 .

对于head1的输出b(记为head1_b)和head2的输出c(记为head2_c)均为不同分割头模块对未标注数据分别经过弱增强模块101，以及强增强模块102输入到网络中的预测结果，两者之间存在预测结果不完全一致的问题。请参阅图5，可以看到，第一分割头输出的预测结果与第二分割头输出的预测结果之间的差异。The output b of head1 (denoted as head1_b) and the output c of head2 (denoted as head2_c) are the prediction results of different segmentation head modules for unlabeled data respectively passed through the weak enhancement module 101 and the strong enhancement module 102 input into the network, There is a problem that the prediction results are not completely consistent between the two. Referring to FIG. 5, it can be seen that the difference between the prediction result output by the first segmentation header and the prediction result output by the second segmentation header.

本申请中，不直接将第二模式输出的head2_c作为伪标签，用于确定head1_b的损失函数，而是基于预测结果的可信度进行判断和加权，具体的加权方式如下：In this application, the head2_c output by the second mode is not directly used as a pseudo-label to determine the loss function of head1_b, but is judged and weighted based on the credibility of the prediction result. The specific weighting method is as follows:

若ω_2,i,j>ω_1,i,j，则

If ω _2,i,j >ω _1,i,j , then

若ω_2,i,j≤ω_1,i,j，则

If ω _2,i,j ≤ω _1,i,j , then

其中，m的值用于区分不同分割头，Wm,ij表示预测结果中第m个分割头输出的第i行j列的像素属于c类的最大概率，其中Pm,ij为属于每个类的概率。C总的预测类别数量。u代表经过修正的权值，对于第i行j列的像素的预测值，若第二分割头输出的预测值大于第一分割头输出的预测值，则权值为1；否则为0。Among them, the value of m is used to distinguish different segmentation heads, Wm,ij represents the maximum probability that the pixel in the i-th row and column j output by the m-th segmentation head in the prediction result belongs to the class c, where Pm,ij is the pixel belonging to each class probability. C The total number of predicted categories. u represents the corrected weight value. For the predicted value of the pixel in row i and column j, if the predicted value output by the second split header is greater than the predicted value output by the first split header, the weight value is 1; otherwise, it is 0.

根据UGRM模块106加权处理的预测结果可以得到非监督损失，结合根据标注数据得到的监督损失，可以更新语义分割头模块104和语义分割头模块105。The unsupervised loss can be obtained according to the weighted prediction result of the UGRM module 106 , combined with the supervised loss obtained from the labeled data, the semantic segmentation header module 104 and the semantic segmentation header module 105 can be updated.

基于图4的半监督模型训练装置的架构进行功能模块划分，请参阅图6，为本申请实施例中半监督模型训练方法的另一个实施例示意图。Functional module division is performed based on the architecture of the semi-supervised model training device in FIG. 4 . Please refer to FIG. 6 , which is a schematic diagram of another embodiment of the semi-supervised model training method in the embodiment of the present application.

601、感知信息输入模块。601. A sensory information input module.

主要负责输入感知信息流，其作为整个系统的信号输入模块，示例性的，该模块输入为车载摄像头采集到的数据，如果包含其每个像素的分类标签，即为标注数据，若不含标签则为未标注数据。It is mainly responsible for inputting the perceptual information flow, which is used as the signal input module of the whole system. For example, the input of this module is the data collected by the vehicle camera. If it contains the classification label of each pixel, it is the labeled data. If it does not contain the label is unlabeled data.

标注数据/无标注数据：作为整个系统的信号输入模块，该模块可以是图像信息。Labeled data/unlabeled data: As a signal input module of the entire system, this module can be image information.

标注数据为RGB通道的图片，以及对应的分割标注label，即为摄像头采集到的RGB数据，和对应的RGB图像中每个像素所属的预先定义的类别(例如，人、车、车道线、路面标示或天空等)；无标注数据即单单为摄像头采集到的RGB数据。The label data is a picture of the RGB channel, and the corresponding segmentation label is the RGB data collected by the camera, and the predefined category to which each pixel in the corresponding RGB image belongs (for example, people, cars, lane lines, road surfaces, etc.) label or sky, etc.); unlabeled data is just the RGB data collected by the camera.

602、强增强、弱增强模块。602. Strong enhancement and weak enhancement modules.

该模块主要负责对输入信息进行数据增强，包括非颜色空间的弱增强，和颜色空间的强增强。This module is mainly responsible for data enhancement of input information, including weak enhancement of non-color space and strong enhancement of color space.

603、特征提取模块。603. A feature extraction module.

特征提取模块主要负责对增强后的图像(包括标注数据和无标注数据经过强弱增强)送入到encoder中提取特征。encoder部分通过下采样降低输入的空间分辨率，从而生成一个低分辨率的特征映射，特征提取可以使得计算高效且能够有效区分不同类别。The feature extraction module is mainly responsible for sending the enhanced image (including labeled data and unlabeled data after strength enhancement) to the encoder to extract features. The encoder part reduces the spatial resolution of the input by downsampling to generate a low-resolution feature map. Feature extraction can make calculations efficient and effectively distinguish between different categories.

604、分割decoder模块。604. Divide the decoder module.

包括模块104和模块105。该部分将特征提取模块提取到的特征上采样到一定尺寸后输出作为语义分割的结果。decoder则对这些特征描述进行上采样，将图像分辨率进行恢复，把获得的特征重新映射到图中的每一个像素点，用于每一个像素点的分类。Including module 104 and module 105. In this part, the feature extracted by the feature extraction module is up-sampled to a certain size and then output as the result of semantic segmentation. The decoder upsamples these feature descriptions, restores the image resolution, and remaps the obtained features to each pixel in the image for the classification of each pixel.

每个head(模块104和模块105分别为一个head)的输出有三个结果，分别为：a.标注数据RGB图像经过模块101弱增强输入到网络中的预测结果；b.未标注数据RGB图像经过模块101弱增强输入到网络中的预测结果；c.未标注数据RGB图像经过模块102强增强输入到网络中的预测结果。所以两个head一共有6个输出。The output of each head (module 104 and module 105 are respectively a head) has three results, which are respectively: a. the prediction result of the label data RGB image input to the network through module 101 weak enhancement; b. the unlabeled data RGB image after Module 101 weakly enhances the prediction result input to the network; c. The unlabeled data RGB image passes through module 102 and strongly enhances the prediction result input to the network. So the two heads have a total of 6 outputs.

605、UGRM模块。605. UGRM module.

该模块将对于head1_b和head2_c，head1_b和head2_c之间的预测的不一致进行一定的判断和加权。This module will make certain judgments and weights on the inconsistency of predictions between head1_b and head2_c, head1_b and head2_c.

例如：对于第i个位置的预测结果，head1_b和head2_c的预测一共有2各种可能：第一：head1_b的预测结果和head2_c的预测结果相同，都为第k类，那么对于head1_b的loss加权即为其置信度；第二：head1_b的预测结果和head2_c的结果不一致，那么对于head1_b的加权即为0。具体处理方法可以参考图4中UGRM模块106，此处不再赘述。For example: For the prediction result of the i-th position, there are 2 possibilities for the prediction of head1_b and head2_c: First: the prediction result of head1_b is the same as the prediction result of head2_c, both of which are in the kth class, then the weighting of loss for head1_b is Its confidence; second: the prediction result of head1_b is inconsistent with the result of head2_c, then the weight of head1_b is 0. For a specific processing method, reference may be made to the UGRM module 106 in FIG. 4 , which will not be repeated here.

上面介绍了本申请提供的半监督模型训练方法，下面对实现该半监督模型训练方法的半监督模型训练装置进行介绍，请参阅图7，为本申请实施例中半监督模型训练装置的一个实施例示意图。The semi-supervised model training method provided by the present application has been introduced above, and the semi-supervised model training device for realizing the semi-supervised model training method is introduced below, please refer to Fig. 7, which is a semi-supervised model training device in the embodiment of the present application Example schematic.

该半监督模型训练装置700，包括：处理单元701，用于将第一增强数据输入第一模型获取所述第一增强数据的第一预测结果，以及，将所述第一增强数据输入第二模型获取所述第一增强数据的第二预测结果，所述第一增强数据根据第一数据增强方法处理第一无标注数据得到，所述第二模型与所述第一模型具有相同的结构和不同的初始化参数；所述处理单元701，还用于将第二增强数据输入所述第一模型获取所述第二增强数据的第三预测结果，以及，将所述第二增强数据输入所述第二模型获取所述第二增强数据的第四预测结果，所述第二增强数据根据第二数据增强方法处理所述第一无标注数据得到，所述第一数据增强方法与所述第二数据增强方法不同；确定单元702，用于根据所述第一预测结果和所述第四预测结果的一致性确定第一非监督损失；所述确定单元702，还用于根据所述第二预测结果和所述第三预测结果的一致性确定第二非监督损失；所述处理单元701，还用于将第一标注数据输入所述第一模型，获取所述第一标注数据的第五预测结果；所述确定单元702，还用于根据所述第五预测结果和所述第一标注数据的标签确定第一监督损失；更新单元703，用于基于所述第一非监督损失、所述第二非监督损失和所述第一监督损失更新所述第一模型的参数。The semi-supervised model training device 700 includes: a processing unit 701, configured to input the first enhanced data into the first model to obtain a first prediction result of the first enhanced data, and input the first enhanced data into the second The model acquires a second prediction result of the first enhanced data, the first enhanced data is obtained by processing the first unlabeled data according to the first data enhancement method, and the second model has the same structure and Different initialization parameters; the processing unit 701 is further configured to input second enhanced data into the first model to obtain a third prediction result of the second enhanced data, and input the second enhanced data into the The second model obtains the fourth prediction result of the second enhanced data, the second enhanced data is obtained by processing the first unlabeled data according to the second data enhancement method, and the first data enhancement method and the second The data enhancement methods are different; the determination unit 702 is used to determine the first unsupervised loss according to the consistency of the first prediction result and the fourth prediction result; the determination unit 702 is also used to determine the first unsupervised loss according to the second prediction result The consistency between the result and the third prediction result determines a second unsupervised loss; the processing unit 701 is further configured to input the first labeled data into the first model, and obtain a fifth prediction of the first labeled data Result; the determining unit 702 is further configured to determine a first supervised loss according to the fifth predicted result and the label of the first labeled data; an updating unit 703 is configured to determine based on the first unsupervised loss, the A second unsupervised loss and the first supervised loss update parameters of the first model.

在一种可能的实现方式中，所述第一模型包括图像的语义分割模型；所述第一数据增强方法为非色彩域增强方法，所述非色彩域增强包括翻转变换、镜像变换、平移变换或尺度变换；所述第二数据增强方法为色彩域增强方法，所述色彩域增强方法包括明暗度变换、对比度变换或图像高斯噪声增强。In a possible implementation, the first model includes an image semantic segmentation model; the first data enhancement method is a non-color gamut enhancement method, and the non-color gamut enhancement includes flip transformation, mirror transformation, translation transformation or scale transformation; the second data enhancement method is a color gamut enhancement method, and the color gamut enhancement method includes shading transformation, contrast transformation or image Gaussian noise enhancement.

在一种可能的实现方式中，所述第一模型包括图像的语义分割模型；所述第二增强数据根据第二数据增强方法处理所述第一无标注数据得到，包括：所述第二增强数据根据所述第一无标注数据和所述第一标注数据进行复制粘贴增强获取。In a possible implementation manner, the first model includes a semantic segmentation model of an image; the second enhanced data is obtained by processing the first unlabeled data according to a second data enhancement method, including: the second enhanced Data is acquired through copy-paste enhancement according to the first unlabeled data and the first labeled data.

在一种可能的实现方式中，所述第一模型为目标检测模型；所述第一数据增强方法包括：翻转变换、平移变换、尺度变换、旋转变换、缩放变换中的一种或者多种；所述第二数据增强方法包括：翻转变换、平移变换、尺度变换、旋转变换、缩放变换中的一种或者多种。In a possible implementation manner, the first model is a target detection model; the first data enhancement method includes: one or more of flip transformation, translation transformation, scale transformation, rotation transformation, and scaling transformation; The second data enhancement method includes: one or more of flip transformation, translation transformation, scale transformation, rotation transformation, and scaling transformation.

在一种可能的实现方式中，所述处理单元701还用于：对所述第一标注数据进行增强处理，得到增强后的第一标注数据，所述增强后的第一标注数据用于输入所述第一模型，以获取所述第五预测结果。In a possible implementation manner, the processing unit 701 is further configured to: perform enhancement processing on the first annotation data to obtain enhanced first annotation data, and the enhanced first annotation data is used for input the first model to obtain the fifth prediction result.

在一种可能的实现方式中，所述处理单元701，还用于将第一标注数据输入所述第二模型，获取所述第一标注数据的第六预测结果；所述确定单元702，还用于根据第六预测结果和第一标注数据的标签确定第二监督损失；所述更新单元703，还用于基于所述第一非监督损失、所述第二非监督损失和所述第二监督损失更新所述第二模型的参数。In a possible implementation manner, the processing unit 701 is further configured to input the first labeled data into the second model, and obtain a sixth prediction result of the first labeled data; the determining unit 702 is also configured to It is used to determine the second supervised loss according to the sixth prediction result and the label of the first labeled data; the updating unit 703 is also used to determine the second supervised loss based on the first unsupervised loss, the second unsupervised loss and the second A supervised loss updates the parameters of the second model.

在一种可能的实现方式中，所述处理单元701具体用于：将所述第一增强数据输入预设特征提取网络，获取第一特征数据；将所述第一特征数据输入所述第一模型获取所述第一预测结果，以及，将所述第一特征数据输入所述第二模型获取所述第二预测结果。In a possible implementation manner, the processing unit 701 is specifically configured to: input the first enhanced data into a preset feature extraction network to obtain first feature data; input the first feature data into the first A model obtains the first prediction result, and the first feature data is input into the second model to obtain the second prediction result.

在一种可能的实现方式中，所述第一模型为图像的语义分割模型；所述处理单元701具体用于：确定第一预测结果对应的第一伪标签，若所述第四预测结果中第一像素位置对应的第一预测概率大于所述第一预测结果中所述第一像素位置的第二预测概率，则所述第一伪标签的权值为1，若所述第一预测概率小于或等于所述第二预测概率，则所述第一伪标签的权值为0；根据所述第一预测结果和第一伪标签确定所述第一非监督损失。In a possible implementation manner, the first model is an image semantic segmentation model; the processing unit 701 is specifically configured to: determine the first pseudo-label corresponding to the first prediction result, if the fourth prediction result is The first predicted probability corresponding to the first pixel position is greater than the second predicted probability of the first pixel position in the first predicted result, the weight of the first pseudo label is 1, if the first predicted probability is less than or equal to the second prediction probability, the weight of the first pseudo-label is 0; and the first unsupervised loss is determined according to the first prediction result and the first pseudo-label.

应理解以上装置的各个单元的划分仅仅是一种逻辑功能的划分，实际实现时可以全部或部分集成到一个物理实体上，也可以物理上分开。且这些单元可以全部以软件通过处理元件调用的形式实现；也可以全部以硬件的形式实现；还可以部分单元以软件通过处理元件调用的形式实现，部分单元以硬件的形式实现。例如，以上这些单元可以是被配置成实施以上方法的一个或多个集成电路，例如：一个或多个特定集成电路(applicationspecific integrated circuit，ASIC)，或，一个或多个微处理器(digital signalprocessor，DSP)，或，一个或者多个现场可编程门阵列(field programmable gate array，FPGA)等。再如，当以上某个单元通过处理元件调度程序的形式实现时，该处理元件可以是通用处理器，例如中央处理器(central processing unit，CPU)或其它可以调用程序的处理器。再如，这些单元可以集成在一起，以片上系统(system-on-a-chip，SOC)的形式实现。It should be understood that the division of each unit of the above device is only a division of logical functions, and may be fully or partially integrated into one physical entity or physically separated during actual implementation. And these units can be implemented in the form of software calling through the processing element; all can be implemented in the form of hardware; some units can also be implemented in the form of software calling through the processing element, and some units can be implemented in the form of hardware. For example, the above units may be one or more integrated circuits configured to implement the above method, for example: one or more specific integrated circuits (application specific integrated circuit, ASIC), or, one or more microprocessors (digital signal processor , DSP), or one or more field programmable gate arrays (field programmable gate array, FPGA), etc. For another example, when one of the above units is implemented in the form of a processing element scheduler, the processing element may be a general processor, such as a central processing unit (central processing unit, CPU) or other processors that can call programs. For another example, these units can be integrated together and implemented in the form of a system-on-a-chip (SOC).

请参阅图8，为本申请实施例中半监督模型训练装置的一个实施例示意图；Please refer to FIG. 8, which is a schematic diagram of an embodiment of a semi-supervised model training device in the embodiment of the present application;

本实施例提供的半监督模型训练装置，可以为车载移动装置、智能手机、深度学习训练平台、API(Application Programming Interface,应用程序编程接口)、终端设备、车载设备或安防监控设备等，本申请实施例中对其具体设备形态不做限定。The semi-supervised model training device provided in this embodiment can be a vehicle-mounted mobile device, a smart phone, a deep learning training platform, an API (Application Programming Interface, an application programming interface), a terminal device, a vehicle-mounted device or a security monitoring device, etc., the present application The embodiment does not limit its specific device form.

该半监督模型训练装置800可因配置或性能不同而产生比较大的差异，可以包括一个或一个以上处理器801和存储器802，该存储器802中存储有程序或数据。The semi-supervised model training device 800 may have relatively large differences due to different configurations or performances, and may include one or more processors 801 and memory 802, and the memory 802 stores programs or data.

其中，存储器802可以是易失性存储或非易失性存储。可选地，处理器801是一个或多个中央处理器(central processing unit，CPU)，该CPU可以是单核CPU，也可以是多核CPU。处理器801可以与存储器802通信，在半监督模型训练装置800上执行存储器802中的一系列指令。Wherein, the memory 802 may be a volatile storage or a non-volatile storage. Optionally, the processor 801 is one or more central processing units (central processing unit, CPU), and the CPU may be a single-core CPU or a multi-core CPU. The processor 801 can communicate with the memory 802 , and execute a series of instructions in the memory 802 on the semi-supervised model training device 800 .

该半监督模型训练装置800还包括一个或一个以上有线或无线网络接口803，例如以太网接口。The semi-supervised model training device 800 also includes one or more wired or wireless network interfaces 803, such as Ethernet interfaces.

可选地，尽管图8中未示出，半监督模型训练装置800还可以包括一个或一个以上电源；一个或一个以上输入输出接口，输入输出接口可以用于连接显示器、鼠标、键盘、触摸屏设备或传感设备等，输入输出接口为可选部件，可以存在也可以不存在，此处不做限定。Optionally, although not shown in Figure 8, the semi-supervised model training device 800 can also include one or more power supplies; one or more input and output interfaces, which can be used to connect display, mouse, keyboard, touch screen devices Or sensing equipment, etc., the input and output interfaces are optional components, which may or may not exist, and are not limited here.

本实施例中半监督模型训练装置800中的处理器801所执行的流程可以参考前述方法实施例中描述的方法流程，此处不加赘述。For the process executed by the processor 801 in the semi-supervised model training apparatus 800 in this embodiment, reference may be made to the method process described in the foregoing method embodiments, and details are not repeated here.

针对图7或图8未详细描述的内容可以参考对图3-6相关部分的描述，具体此处不再赘述。For content not described in detail in FIG. 7 or FIG. 8 , reference may be made to the description of relevant parts in FIGS. 3-6 , and details are not repeated here.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的系统，装置和单元的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described system, device and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

在本申请所提供的几个实施例中，应该理解到，所揭露的系统，装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device and method can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(read-only memory，ROM)、随机存取存储器(random access memory，RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other various media that can store program codes. .

以上所述，以上实施例仅用以说明本申请的技术方案，而非对其限制；尽管参照前述实施例对本申请进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, and are not intended to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still understand the foregoing The technical solutions described in each embodiment are modified, or some of the technical features are replaced equivalently; and these modifications or replacements do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the various embodiments of the application.

Claims

1. A semi-supervised model training method is characterized by comprising the following steps:

inputting first enhancement data into a first model to obtain a first prediction result of the first enhancement data, and inputting the first enhancement data into a second model to obtain a second prediction result of the first enhancement data, wherein the first enhancement data is obtained by processing first label-free data according to a first data enhancement method, and the second model and the first model have the same structure and different initialization parameters;

inputting second enhancement data into the first model to obtain a third prediction result of the second enhancement data, and inputting the second enhancement data into the second model to obtain a fourth prediction result of the second enhancement data, wherein the second enhancement data is obtained by processing the first label-free data according to a second data enhancement method, and the first data enhancement method is different from the second data enhancement method;

determining a first unsupervised loss according to a consistency of the first prediction result and the fourth prediction result;

determining a second unsupervised loss according to consistency of the second predicted result and the third predicted result;

inputting first annotation data into the first model, and acquiring a fifth prediction result of the first annotation data;

determining a first supervision loss according to the fifth prediction result and the label of the first annotation data;

updating parameters of the first model based on the first unsupervised loss, the second unsupervised loss, and the first supervised loss.

2. The method of claim 1, wherein the first model comprises a semantic segmentation model of an image;

the first data enhancement method is a non-color domain enhancement method, and the non-color domain enhancement comprises inversion transformation, mirror transformation, translation transformation or scale transformation;

the second data enhancement method is a color gamut enhancement method that includes a shading transform, a contrast transform, or image gaussian noise enhancement.

3. The method of claim 1 or 2, wherein the first model comprises a semantic segmentation model of an image;

the second enhancement data is obtained by processing the first label-free data according to a second data enhancement method, and comprises the following steps:

and the second enhancement data is obtained by copying and pasting enhancement according to the first unmarked data and the first marked data.

4. The method of claim 1, wherein the first model is an object detection model;

the first data enhancement method includes: one or more of a flip transformation, a translation transformation, a scale transformation, a rotation transformation, and a scaling transformation;

the second data enhancement method includes: one or more of a flipping transformation, a translation transformation, a scaling transformation, a rotation transformation, a scaling transformation.

5. The method of any of claims 1 to 4, wherein prior to entering the first annotation data into the first model, the method further comprises:

and performing enhancement processing on the first annotation data to obtain enhanced first annotation data, wherein the enhanced first annotation data is used for inputting the first model to obtain the fifth prediction result.

6. The method according to any one of claims 1 to 5, further comprising:

inputting the first annotation data into the second model, and obtaining a sixth prediction result of the first annotation data;

determining a second supervision loss according to the sixth prediction result and the label of the first labeling data;

updating parameters of the second model based on the first unsupervised loss, the second unsupervised loss, and the second supervised loss.

7. The method of any one of claims 1 to 6, wherein entering the first enhancement data into a first model obtains a first predicted outcome of the first enhancement data, and entering the first enhancement data into a second model obtains a second predicted outcome of the first enhancement data, comprises:

inputting the first enhanced data into a preset feature extraction network to obtain first feature data;

inputting the first feature data into the first model to obtain the first prediction result, and inputting the first feature data into the second model to obtain the second prediction result.

8. The method according to any one of claims 1 to 7, wherein the first model is a semantic segmentation model of an image;

the determining a first unsupervised loss as a function of the agreement of the first prediction and the fourth prediction comprises:

determining a first pseudo label corresponding to a first prediction result, wherein if a first prediction probability corresponding to a first pixel position in the fourth prediction result is greater than a second prediction probability of the first pixel position in the first prediction result, a weight of the first pseudo label is 1, and if the first prediction probability is less than or equal to the second prediction probability, the weight of the first pseudo label is 0;

determining the first unsupervised loss based on the first prediction and a first pseudo label.

9. A semi-supervised model training device, comprising:

the processing unit is used for inputting first enhancement data into a first model to obtain a first prediction result of the first enhancement data, and inputting the first enhancement data into a second model to obtain a second prediction result of the first enhancement data, wherein the first enhancement data is obtained by processing first label-free data according to a first data enhancement method, and the second model and the first model have the same structure and different initialization parameters;

the processing unit is further configured to input second enhancement data into the first model to obtain a third prediction result of the second enhancement data, and input the second enhancement data into the second model to obtain a fourth prediction result of the second enhancement data, where the second enhancement data is obtained by processing the first label-free data according to a second data enhancement method, and the first data enhancement method is different from the second data enhancement method;

a determining unit, configured to determine a first unsupervised loss according to a consistency of the first prediction result and the fourth prediction result;

the determining unit is further used for determining a second unsupervised loss according to the consistency of the second prediction result and the third prediction result;

the processing unit is further configured to input first annotation data into the first model, and obtain a fifth prediction result of the first annotation data;

the determining unit is further configured to determine a first supervision loss according to the fifth prediction result and the label of the first annotation data;

an updating unit for updating parameters of the first model based on the first unsupervised loss, the second unsupervised loss and the first supervised loss.

10. The apparatus of claim 9, wherein the first model comprises a semantic segmentation model of an image;

11. The apparatus of claim 9 or 10, wherein the first model comprises a semantic segmentation model of an image;

12. The apparatus of claim 9, wherein the first model is an object detection model;

13. The apparatus of any of claims 9 to 12, wherein the processing unit is further configured to:

14. The apparatus according to any one of claims 9 to 13,

the processing unit is further configured to input first annotation data into the second model, and obtain a sixth prediction result of the first annotation data;

the determining unit is further configured to determine a second supervision loss according to the sixth prediction result and the label of the first annotation data;

the updating unit is further configured to update parameters of the second model based on the first unsupervised loss, the second unsupervised loss, and the second supervised loss.

15. The apparatus according to any one of claims 9 to 14, wherein the processing unit is specifically configured to:

16. The apparatus according to any one of claims 9 to 15, wherein the first model is a semantic segmentation model of an image;

the processing unit is specifically configured to:

17. A semi-supervised model training device, comprising:

a memory having computer readable instructions stored therein;

a processor coupled to the memory, the computer readable instructions, when executed by the processor, cause the semi-supervised model training apparatus to implement the method of any of claims 1 to 8.

18. A computer program product comprising computer readable instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 8.

19. A computer-readable storage medium having stored therein instructions which, when executed on a computer, cause the computer to perform the method of any one of claims 1 to 8.