CN117576522B

CN117576522B - A model training method and device based on dynamic defense of mimicry structure

Info

Publication number: CN117576522B
Application number: CN202410076456.XA
Authority: CN
Inventors: 张音捷; 王之宇; 张奕鹏; 白冰; 孙才俊; 孙天宁; 徐昊天
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2024-01-18
Filing date: 2024-01-18
Publication date: 2024-04-26
Anticipated expiration: 2044-01-18
Also published as: CN117576522A

Abstract

This specification discloses a model training method and device based on dynamic defense of mimicry structure. The task execution method includes: obtaining a pre-trained model, inputting the first image used to train the pre-trained model into the pre-trained model, and obtaining the recognition result corresponding to the first image. According to the recognition result corresponding to the first image and the actual label corresponding to the first image, the second image is determined. The second image is input into the pre-trained model to determine the weights corresponding to each sub-recognition network set in the pre-trained model through the weight network layer in the pre-trained model, and the second image is recognized by each sub-recognition network to obtain each recognition result, and each recognition result is weighted according to the weights corresponding to each sub-recognition network determined to obtain the final recognition result, and the pre-trained model is trained with the optimization goal of minimizing the deviation between the final recognition result and the actual label.

Description

A model training method and device based on dynamic defense of mimicry structure

技术领域Technical Field

本说明书涉及人工智能领域，尤其涉及一种基于拟态结构动态防御的模型训练方法及装置。This specification relates to the field of artificial intelligence, and in particular to a model training method and device based on dynamic defense of mimicry structure.

背景技术Background technique

随着人工智能的快速发展，深度学习在图像识别、自然语言处理、语音识别等多个领域都取得了显著的成就。然而，神经网络模型在噪声、扰动、对抗性攻击等复杂环境中的表现往往不稳定，缺乏鲁棒性。With the rapid development of artificial intelligence, deep learning has achieved remarkable results in many fields such as image recognition, natural language processing, speech recognition, etc. However, the performance of neural network models in complex environments such as noise, disturbance, and adversarial attacks is often unstable and lacks robustness.

目前，当使用神经网络模型对图片进行识别时，其中输入到神经网络模型中的图片样本可能包括对抗性样本，神经网络模型在面对对抗性样本进行识别时，输出的结果准确率可能会降低，并且随着时间推移，对抗性样本的种类不断增加，当前的训练方法不足以得到准确性较高的神经网络模型。Currently, when a neural network model is used to recognize images, the image samples input into the neural network model may include adversarial samples. When the neural network model is faced with adversarial samples for recognition, the accuracy of the output results may decrease. Moreover, as time goes by, the types of adversarial samples continue to increase. The current training method is not sufficient to obtain a neural network model with higher accuracy.

基于此，如何提高神经网络模型训练的准确性，则是一个亟待解决的问题。Based on this, how to improve the accuracy of neural network model training is an urgent problem to be solved.

发明内容Summary of the invention

本说明书提供一种基于拟态结构动态防御的模型训练方法及装置，以部分的解决现有技术存在的上述问题。This specification provides a model training method and device based on dynamic defense of mimicry structure to partially solve the above-mentioned problems existing in the prior art.

本说明书采用下述技术方案：This manual adopts the following technical solutions:

本说明书提供了一种基于拟态结构动态防御的模型训练方法，包括：This specification provides a model training method based on dynamic defense of mimicry structure, including:

获取预训练模型；Get the pre-trained model;

将训练所述预训练模型所使用的第一图像输入到所述预训练模型中，得到所述第一图像对应的识别结果；Inputting a first image used to train the pre-trained model into the pre-trained model to obtain a recognition result corresponding to the first image;

根据所述第一图像对应的识别结果以及所述第一图像对应的实际标签，确定所述第一图像对应的梯度信息；Determining gradient information corresponding to the first image according to a recognition result corresponding to the first image and an actual label corresponding to the first image;

根据所述梯度信息对应梯度方向的反向梯度方向，生成干扰数据；Generate interference data according to a reverse gradient direction of a gradient direction corresponding to the gradient information;

将所述干扰数据加入到所述第一图像中，得到第二图像；Adding the interference data to the first image to obtain a second image;

将所述第二图像输入到所述预训练模型中，以通过所述预训练模型中的权重网络层，确定所述预训练模型中设置的各子识别网络对应的权重，以及通过每个子识别网络，分别对所述第二图像进行识别，得到各识别结果，并根据确定出的所述各子识别网络对应的权重，对所述各识别结果进行加权，得到最终识别结果；Inputting the second image into the pre-trained model, determining the weights corresponding to the sub-recognition networks set in the pre-trained model through the weight network layer in the pre-trained model, and respectively recognizing the second image through each sub-recognition network to obtain each recognition result, and weighting each recognition result according to the determined weights corresponding to each sub-recognition network to obtain a final recognition result;

以最小化所述最终识别结果与所述实际标签之间的偏差为优化目标，对所述预训练模型进行训练。The pre-training model is trained with the optimization goal of minimizing the deviation between the final recognition result and the actual label.

可选地，所述各子识别网络中包含有第一子识别网络以及第二子识别网络，所述第一子识别网络用于通过学习到的针对第一图像进行识别的识别规则，对输入到第一子识别网络中的图像进行识别，所述第二子识别网络用于通过学习到的针对第二图像进行识别的识别规则，对输入到第二子识别网络中的图像进行识别；Optionally, each of the sub-recognition networks includes a first sub-recognition network and a second sub-recognition network, the first sub-recognition network is used to recognize an image input into the first sub-recognition network by using a learned recognition rule for recognizing a first image, and the second sub-recognition network is used to recognize an image input into the second sub-recognition network by using a learned recognition rule for recognizing a second image;

以最小化所述最终识别结果与所述实际标签之间的偏差为优化目标，对所述预训练模型进行训练，具体包括：The pre-training model is trained with minimizing the deviation between the final recognition result and the actual label as the optimization goal, specifically including:

固定所述预训练模型中所述第一子识别网络的网络参数，并以最小化所述最终识别结果与所述实际标签之间的偏差为优化目标，对所述第二子识别网络中的网络参数以及所述权重网络层中的网络参数进行调整。The network parameters of the first sub-recognition network in the pre-trained model are fixed, and the network parameters in the second sub-recognition network and the network parameters in the weight network layer are adjusted with the optimization goal of minimizing the deviation between the final recognition result and the actual label.

可选地，所述方法还包括：Optionally, the method further comprises:

当监测到获取与训练所述预训练模型所使用的图像数据的类型不符的图像数据时，生成若干新的子识别网络，并将所述新的子识别网络部署到所述预训练模型中，并根据所述新的子识别网络，对所述权重网络层进行维度扩展，得到更新后的预训练模型；When it is monitored that image data that does not match the type of image data used to train the pre-trained model is obtained, a number of new sub-recognition networks are generated, and the new sub-recognition networks are deployed to the pre-trained model, and the weight network layer is dimensionally expanded according to the new sub-recognition networks to obtain an updated pre-trained model;

将获取到与训练所述预训练模型所使用的图像数据的类型不符的图像数据以及原始的图像数据作为扩充后图像数据输入到所述更新后的预训练模型中，以通过所述更新后的预训练模型中的权重网络层，确定原始的各子识别网络对应的权重和各新的子识别网络对应的权重，以及通过原始的每个子识别网络和每个新的子识别网络，分别对所述扩充后图像数据进行识别，得到各识别结果，并根据确定出的原始的各子识别网络对应的权重和所述各新的子识别网络对应的权重，对所述各识别结果进行加权，得到所述扩充后图像数据对应的识别结果；Inputting the acquired image data that does not conform to the type of image data used to train the pre-trained model and the original image data as expanded image data into the updated pre-trained model, so as to determine the weights corresponding to the original sub-recognition networks and the weights corresponding to the new sub-recognition networks through the weight network layer in the updated pre-trained model, and respectively recognize the expanded image data through each original sub-recognition network and each new sub-recognition network to obtain each recognition result, and weighting the recognition results according to the determined weights corresponding to the original sub-recognition networks and the weights corresponding to the new sub-recognition networks to obtain the recognition result corresponding to the expanded image data;

以最小化所述扩充后图像数据对应的识别结果与所述扩充后图像数据对应的实际标签之间的偏差为优化目标，对所述更新后的预训练模型进行训练。The updated pre-trained model is trained with the optimization goal of minimizing the deviation between the recognition result corresponding to the expanded image data and the actual label corresponding to the expanded image data.

可选地，以最小化所述扩充后图像数据对应的识别结果与所述扩充后图像数据对应的实际标签之间的偏差为优化目标，对所述更新后的预训练模型进行训练，具体包括：Optionally, the updated pre-trained model is trained with minimizing the deviation between the recognition result corresponding to the expanded image data and the actual label corresponding to the expanded image data as an optimization goal, specifically comprising:

固定所述预训练模型中原始的各子识别网络的网络参数以及所述权重网络层中针对原始的各子识别网络的维度所对应的网络参数；Fixing the network parameters of the original sub-recognition networks in the pre-trained model and the network parameters corresponding to the dimensions of the original sub-recognition networks in the weight network layer;

以最小化所述扩充后图像数据对应的识别结果与所述扩充后图像数据对应的实际标签之间的偏差为优化目标，对所述各新的子识别网络中的网络参数以及所述权重网络层中针对所述各新的子识别网络的扩展维度对应的网络参数进行调整。With minimizing the deviation between the recognition result corresponding to the expanded image data and the actual label corresponding to the expanded image data as the optimization goal, the network parameters in each new sub-recognition network and the network parameters corresponding to the expanded dimension of each new sub-recognition network in the weight network layer are adjusted.

可选地，以最小化所述扩充后图像数据对应的识别结果与所述扩充后图像数据对应的实际标签之间的偏差为优化目标，对所述更新后的预训练模型进行训练，具体包括，具体包括：Optionally, the updated pre-trained model is trained with minimizing the deviation between the recognition result corresponding to the expanded image data and the actual label corresponding to the expanded image data as an optimization goal, specifically including:

针对第N轮次的训练，根据在第N轮次训练中得到的所述扩充后图像数据对应的识别结果与所述扩充后图像数据对应的实际标签之间的偏差，得到第一损失值；For the Nth round of training, a first loss value is obtained according to a deviation between a recognition result corresponding to the expanded image data obtained in the Nth round of training and an actual label corresponding to the expanded image data;

针对每个原始的子识别网络，根据训练前所述预训练模型中原始的该子识别网络针对所述扩充后图像数据所得到的识别结果，与经过第N-1轮次训练后所述预训练模型中原始的该子识别网络针对所述扩充后图像数据所得到的识别结果之间的偏差，确定原始的该子识别网络所对应的第二损失值；For each original sub-recognition network, determine a second loss value corresponding to the original sub-recognition network according to a deviation between a recognition result obtained by the original sub-recognition network in the pre-training model for the expanded image data before training and a recognition result obtained by the original sub-recognition network in the pre-training model for the expanded image data after the N-1th round of training;

根据所述第一损失值和所述第二损失值，得到总损失值，以通过最小化所述总损失值为优化目标，对所述预训练模型进行第N轮次的训练。A total loss value is obtained according to the first loss value and the second loss value, and the pre-trained model is trained for an Nth round by minimizing the total loss value as an optimization target.

本说明书提供了一种基于拟态结构动态防御的模型训练装置，包括：This specification provides a model training device based on dynamic defense of mimicry structure, including:

获取模块，用于获取预训练模型；Acquisition module, used to obtain pre-trained models;

生成模块，用于将训练所述预训练模型所使用的第一图像输入到所述预训练模型中，得到所述第一图像对应的识别结果；根据所述第一图像对应的识别结果以及所述第一图像对应的实际标签，确定所述第一图像对应的梯度信息；根据所述梯度信息对应梯度方向的反向梯度方向，生成干扰数据；将所述干扰数据加入到所述第一图像中，得到第二图像；A generation module, used to input the first image used to train the pre-trained model into the pre-trained model to obtain a recognition result corresponding to the first image; determine the gradient information corresponding to the first image according to the recognition result corresponding to the first image and the actual label corresponding to the first image; generate interference data according to the reverse gradient direction of the gradient direction corresponding to the gradient information; add the interference data to the first image to obtain a second image;

加权模块，用于将所述第二图像输入到所述预训练模型中，以通过所述预训练模型中的权重网络层，确定所述预训练模型中设置的各子识别网络对应的权重，以及通过每个子识别网络，分别对所述第二图像进行识别，得到各识别结果，并根据确定出的所述各子识别网络对应的权重，对所述各识别结果进行加权，得到最终识别结果；A weighting module, used for inputting the second image into the pre-trained model, determining the weights corresponding to the sub-recognition networks set in the pre-trained model through the weight network layer in the pre-trained model, and respectively recognizing the second image through each sub-recognition network to obtain each recognition result, and weighting each recognition result according to the determined weights corresponding to each sub-recognition network to obtain a final recognition result;

训练模块，用于以最小化所述最终识别结果与所述实际标签之间的偏差为优化目标，对所述预训练模型进行训练。The training module is used to train the pre-training model with the optimization goal of minimizing the deviation between the final recognition result and the actual label.

所述训练模块具体用于，固定所述预训练模型中所述第一子识别网络的网络参数，并以最小化所述最终识别结果与所述实际标签之间的偏差为优化目标，对所述第二子识别网络中的网络参数以及所述权重网络层中的网络参数进行调整。The training module is specifically used to fix the network parameters of the first sub-recognition network in the pre-trained model, and adjust the network parameters in the second sub-recognition network and the network parameters in the weight network layer with the optimization goal of minimizing the deviation between the final recognition result and the actual label.

可选地，所述训练模块还用于，当监测到获取与训练所述预训练模型所使用的图像数据的类型不符的图像数据时，生成若干新的子识别网络，并将所述新的子识别网络部署到所述预训练模型中，并根据所述新的子识别网络，对所述权重网络层进行维度扩展，得到更新后的预训练模型；将获取到与训练所述预训练模型所使用的图像数据的类型不符的图像数据以及原始的图像数据作为扩充后图像数据输入到所述更新后的预训练模型中，以通过所述更新后的预训练模型中的权重网络层，确定原始的各子识别网络对应的权重和各新的子识别网络对应的权重，以及通过原始的每个子识别网络和每个新的子识别网络，分别对所述扩充后图像数据进行识别，得到各识别结果，并根据确定出的原始的各子识别网络对应的权重和所述各新的子识别网络对应的权重，对所述各识别结果进行加权，得到所述扩充后图像数据对应的识别结果；以最小化所述扩充后图像数据对应的识别结果与所述扩充后图像数据对应的实际标签之间的偏差为优化目标，对所述更新后的预训练模型进行训练。Optionally, the training module is also used to, when it is monitored that image data of a type inconsistent with the image data used to train the pre-trained model is obtained, generate several new sub-recognition networks, deploy the new sub-recognition networks into the pre-trained model, and according to the new sub-recognition networks, dimensionally expand the weight network layer to obtain an updated pre-trained model; input the image data that is inconsistent with the type of image data used to train the pre-trained model and the original image data as expanded image data into the updated pre-trained model, so as to confirm through the weight network layer in the updated pre-trained model. Determine the weights corresponding to the original sub-recognition networks and the weights corresponding to the new sub-recognition networks, and respectively recognize the expanded image data through each original sub-recognition network and each new sub-recognition network to obtain each recognition result, and weight the each recognition result according to the determined weights corresponding to the original sub-recognition networks and the weights corresponding to the new sub-recognition networks to obtain the recognition result corresponding to the expanded image data; and train the updated pre-trained model with the optimization goal of minimizing the deviation between the recognition result corresponding to the expanded image data and the actual label corresponding to the expanded image data.

本说明书提供了一种计算机可读存储介质，所述存储介质存储有计算机程序，所述计算机程序被处理器执行时实现上述基于拟态结构动态防御的模型训练方法。This specification provides a computer-readable storage medium, which stores a computer program. When the computer program is executed by a processor, it implements the above-mentioned model training method based on dynamic defense of mimetic structure.

本说明书提供了一种电子设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现上述基于拟态结构动态防御的模型训练方法。This specification provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the above-mentioned model training method based on dynamic defense of mimetic structure when executing the program.

本说明书采用的上述至少一个技术方案能够达到以下有益效果：At least one of the above technical solutions adopted in this specification can achieve the following beneficial effects:

在本说明书提供的基于拟态结构动态防御的模型训练方法中，获取预训练模型，将训练预训练模型所使用的第一图像输入到预训练模型中，得到第一图像对应的识别结果。根据第一图像对应的识别结果以及第一图像对应的实际标签，确定出第二图像。将第二图像输入到预训练模型中，以通过预训练模型中的权重网络层，确定预训练模型中设置的各子识别网络对应的权重，以及通过每个子识别网络，分别对第二图像进行识别，得到各识别结果，并根据确定出的各子识别网络对应的权重，对各识别结果进行加权，得到最终识别结果，以最小化最终识别结果与实际标签之间的偏差为优化目标，对预训练模型进行训练。In the model training method based on dynamic defense of mimetic structure provided in this specification, a pre-trained model is obtained, and the first image used to train the pre-trained model is input into the pre-trained model to obtain the recognition result corresponding to the first image. According to the recognition result corresponding to the first image and the actual label corresponding to the first image, the second image is determined. The second image is input into the pre-trained model to determine the weights corresponding to each sub-recognition network set in the pre-trained model through the weight network layer in the pre-trained model, and the second image is recognized by each sub-recognition network to obtain each recognition result, and each recognition result is weighted according to the weights corresponding to each sub-recognition network determined to obtain the final recognition result, and the pre-trained model is trained with the optimization goal of minimizing the deviation between the final recognition result and the actual label.

从上述方法可以看出，在本说明书提供的基于拟态结构动态防御的模型训练方法中，预训练模型中有多个子识别网络，每个子识别网络会针对输入的图像数据给出不同的识别结果，并且权重网络层会根据图像数据的样本特性，确定出哪个子识别模型输出的识别结果更具有说服力，从而得到更准确的针对输入到预训练模型中的图像数据的识别结果。在实际应用中，随着时间推移，输入到预训练模型中的对抗性样本的种类不断增加，而本说明书中随着大量的样本输入到模型中，模型会根据输入的样本生成若干新的子识别网络，通过不断添加子识别网络，并对预训练模型进行训练，进而使得训练之后的预训练模型在风险识别领域防御性更强。It can be seen from the above method that in the model training method based on dynamic defense of mimetic structure provided in this specification, there are multiple sub-recognition networks in the pre-trained model, each sub-recognition network will give different recognition results for the input image data, and the weight network layer will determine which sub-recognition model outputs a more convincing recognition result based on the sample characteristics of the image data, thereby obtaining a more accurate recognition result for the image data input into the pre-trained model. In practical applications, as time goes by, the types of adversarial samples input into the pre-trained model continue to increase, and in this specification, as a large number of samples are input into the model, the model will generate several new sub-recognition networks based on the input samples, and by continuously adding sub-recognition networks and training the pre-trained model, the pre-trained model after training will be more defensive in the field of risk identification.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

此处所说明的附图用来提供对本说明书的进一步理解，构成本说明书的一部分，本说明书的示意性实施例及其说明用于解释本说明书，并不构成对本说明书的不当限定。在附图中：The drawings described herein are used to provide a further understanding of this specification and constitute a part of this specification. The illustrative embodiments and descriptions of this specification are used to explain this specification and do not constitute an improper limitation on this specification. In the drawings:

图1为本说明书中提供的一种基于拟态结构动态防御的模型训练方法的流程示意图；FIG1 is a flow chart of a model training method based on dynamic defense of mimicry structure provided in this specification;

图2为本说明书提供的一种预训练模型的训练过程示意图；FIG2 is a schematic diagram of a training process of a pre-training model provided in this specification;

图3为本说明书提供的一种预训练模型的训练过程示意图；FIG3 is a schematic diagram of a training process of a pre-training model provided in this specification;

图4为本说明书提供的一种基于拟态结构动态防御的模型训练装置的示意图；FIG4 is a schematic diagram of a model training device based on dynamic defense of mimicry structure provided in this specification;

图5为本说明书提供的一种对应于图1的电子设备示意图。FIG. 5 is a schematic diagram of an electronic device provided in this specification corresponding to FIG. 1 .

具体实施方式Detailed ways

为使本说明书的目的、技术方案和优点更加清楚，下面将结合本说明书具体实施例及相应的附图对本说明书技术方案进行清楚、完整地描述。显然，所描述的实施例仅是本说明书一部分实施例，而不是全部的实施例。基于本说明书中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本说明书保护的范围。In order to make the purpose, technical solutions and advantages of this specification clearer, the technical solutions of this specification will be clearly and completely described below in combination with the specific embodiments of this specification and the corresponding drawings. Obviously, the described embodiments are only part of the embodiments of this specification, not all of the embodiments. Based on the embodiments in this specification, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of this specification.

以下结合附图，详细说明本说明书各实施例提供的技术方案。The technical solutions provided by the embodiments of this specification are described in detail below in conjunction with the accompanying drawings.

图1为本说明书中提供的一种基于拟态结构动态防御的模型训练方法的流程示意图，包括以下步骤：FIG1 is a flow chart of a model training method based on dynamic defense of mimicry structure provided in this specification, which includes the following steps:

S101：获取预训练模型。S101: Obtain a pre-trained model.

深度学习作为人工智能领域的一个重要分支，已经在图像识别、自然语言处理、语音识别等多个方面取得了显著的成就。当前，使用神经网络模型对图像进行识别时，在噪声、扰动、对抗性攻击等复杂环境中的表现往往不稳定，缺乏鲁棒性。尤其是当出现对抗性样本时，神经网络模型不能准确识别图像。As an important branch of artificial intelligence, deep learning has achieved remarkable results in image recognition, natural language processing, speech recognition and many other fields. At present, when using neural network models to recognize images, their performance in complex environments such as noise, disturbance, and adversarial attacks is often unstable and lacks robustness. In particular, when adversarial samples appear, neural network models cannot accurately recognize images.

其中，对抗性样本是指在机器学习模型中，通过向输入数据添加精心设计的扰动或干扰，使模型对对抗性样本的判断产生错误的一种特殊样本。这些样本通常在训练数据的基础上进行微小的修改，以使修改后的样本在模型中产生预期的错误判断。对抗性样本的存在可能会对机器学习模型的性能产生严重影响，导致模型在某些情况下无法正确地分类或识别数据。因此，在对抗性攻击下，模型的准确性和稳定性可能会受到严重影响。对抗性样本的制造通常需要具备一定的技巧和知识，以便正确地设计和制造出能够欺骗模型的扰动或干扰。制造对抗性样本的方法包括但不限于添加噪声、改变图像的颜色或亮度、修改文本的语法或语义等。Among them, adversarial samples refer to a special type of sample in a machine learning model that makes the model make an error in its judgment of the adversarial sample by adding carefully designed perturbations or interferences to the input data. These samples are usually slightly modified based on the training data so that the modified samples produce the expected wrong judgment in the model. The existence of adversarial samples may have a serious impact on the performance of machine learning models, causing the model to be unable to correctly classify or identify data in some cases. Therefore, under adversarial attacks, the accuracy and stability of the model may be seriously affected. The creation of adversarial samples usually requires certain skills and knowledge in order to correctly design and create perturbations or interferences that can deceive the model. Methods for creating adversarial samples include but are not limited to adding noise, changing the color or brightness of an image, and modifying the grammar or semantics of a text.

另外，随着时间推移，越来越多的对抗性样本出现，其种类变化无穷，使用传统的训练手段训练神经网络模型时，可能会使神经网络模型产生灾难性遗忘问题，从而造成得到的图像识别结果不准确。In addition, as time goes by, more and more adversarial samples appear, and their types vary infinitely. When using traditional training methods to train neural network models, the neural network models may cause catastrophic forgetting problems, resulting in inaccurate image recognition results.

基于此，本说明书提供了一种基于拟态结构动态防御的模型训练方法，获取预训练模型，将训练预训练模型所使用的第一图像输入到预训练模型中，得到第一图像对应的识别结果。根据第一图像对应的识别结果以及第一图像对应的实际标签，确定出第一图像对应的梯度信息，并根据梯度信息对应梯度方向的反向梯度方向，生成干扰数据，然后，将干扰数据加入到第一图像中，得到第二图像。将第二图像输入到预训练模型中，以通过预训练模型中的权重网络层，确定预训练模型中设置的各子识别网络对应的权重，以及通过每个子识别网络，分别对第二图像进行识别，得到各识别结果，并根据确定出的各子识别网络对应的权重，对各识别结果进行加权，得到最终识别结果，以最小化最终识别结果与实际标签之间的偏差为优化目标，对预训练模型进行训练。Based on this, the present specification provides a model training method based on dynamic defense of mimetic structure, obtains a pre-trained model, inputs the first image used to train the pre-trained model into the pre-trained model, and obtains the recognition result corresponding to the first image. According to the recognition result corresponding to the first image and the actual label corresponding to the first image, the gradient information corresponding to the first image is determined, and interference data is generated according to the reverse gradient direction of the gradient direction corresponding to the gradient information, and then the interference data is added to the first image to obtain the second image. The second image is input into the pre-trained model to determine the weights corresponding to each sub-recognition network set in the pre-trained model through the weight network layer in the pre-trained model, and the second image is recognized by each sub-recognition network to obtain each recognition result, and each recognition result is weighted according to the weights corresponding to each sub-recognition network determined to obtain the final recognition result, and the pre-trained model is trained with the optimization goal of minimizing the deviation between the final recognition result and the actual label.

这种训练方式可以提高神经网络模型的鲁棒性，在面对对抗性样本时，也能使神经网络模型更加安全可靠，相较于传统的训练方式，提高了神经网络模型的准确性，以得到更加接近真实标签的识别结果。This training method can improve the robustness of the neural network model and make the neural network model more secure and reliable when facing adversarial samples. Compared with traditional training methods, it improves the accuracy of the neural network model to obtain recognition results that are closer to the real label.

在本说明书中，用于实现一种基于拟态结构动态防御的模型训练方法的执行主体可以是诸如笔记本电脑、平板电脑等终端设备，当然，也可以是服务器，为了便于描述，本说明书仅以服务器作为执行主体为例，对本说明书中提供的一种基于拟态结构动态防御的模型训练方法进行说明。In this specification, the execution entity for implementing a model training method based on dynamic defense of mimetic structure can be a terminal device such as a laptop computer, a tablet computer, or, of course, a server. For the sake of ease of description, this specification only takes the server as an execution entity as an example to illustrate a model training method based on dynamic defense of mimetic structure provided in this specification.

在本说明书中，服务器首先获取预训练模型，这里的预训练模型包括权重网络层以及一个或多个子识别网络，其中，权重网络层是针对每一个子识别网络给出相应的权重，子识别网络是针对输入的图像数据分别给出识别结果。In this specification, the server first obtains a pre-trained model, where the pre-trained model includes a weight network layer and one or more sub-recognition networks, wherein the weight network layer gives corresponding weights for each sub-recognition network, and the sub-recognition network gives recognition results for the input image data respectively.

S102：将训练所述预训练模型所使用的第一图像输入到所述预训练模型中，得到所述第一图像对应的识别结果。S102: Inputting a first image used for training the pre-training model into the pre-training model to obtain a recognition result corresponding to the first image.

服务器可以将第一图像输入到预训练模型中第一图像对应的子识别网络中，以对预训练模型中第一图像对应的子识别网络进行训练，其中，第一图像可以有多种，例如，普通图像数据、困难图像数据。The server can input the first image into the sub-recognition network corresponding to the first image in the pre-trained model to train the sub-recognition network corresponding to the first image in the pre-trained model, wherein the first image can be of multiple types, for example, ordinary image data and difficult image data.

其中，普通图像是指清晰度较高的正常图像，困难图像包含的情况有多种，例如，图像模糊、图像主体残缺、图像主体不明、图像倾斜翻转等。Among them, ordinary images refer to normal images with higher clarity, and difficult images include various situations, such as blurred images, incomplete image subjects, unclear image subjects, tilted and flipped images, etc.

困难图像数据可以由普通图像数据经过算法造成损坏后生成，例如，对图像进行噪声处理，如高斯噪声、散粒噪声、脉冲噪声等；对图像进行模糊处理，如散焦模糊、磨砂玻璃模糊、运动模糊、变焦模糊等；在图像中增加天气因素，如雪、霜、雾等；对图像的数码类别进行调整，如亮度、对比度、塑性、像素化、JPEG。Difficult image data can be generated by damaging ordinary image data through an algorithm. For example, performing noise processing on the image, such as Gaussian noise, shot noise, impulse noise, etc.; performing blur processing on the image, such as defocus blur, frosted glass blur, motion blur, zoom blur, etc.; adding weather factors to the image, such as snow, frost, fog, etc.; adjusting the digital category of the image, such as brightness, contrast, plasticity, pixelation, and JPEG.

进一步地，将第一图像输入到预训练模型中，得到第一图像对应的识别结果。Furthermore, the first image is input into the pre-trained model to obtain a recognition result corresponding to the first image.

S103：根据所述第一图像对应的识别结果以及所述第一图像对应的实际标签，确定所述第一图像对应的梯度信息。S103: Determine gradient information corresponding to the first image according to a recognition result corresponding to the first image and an actual label corresponding to the first image.

服务器将第一图像输入到预训练模型中，得到第一图像对应的识别结果，再计算出损失函数，这里使用交叉熵损失计算损失值，可以表示为：The server inputs the first image into the pre-trained model, obtains the recognition result corresponding to the first image, and then calculates the loss function. Here, the cross entropy loss is used to calculate the loss value, which can be expressed as:

其中，预训练模型的输入为，输出为对/>的图像识别结果，记作/>，/>表示真实标签的第 i 个类别，/>是预训练模型输出的图像识别结果中第 i 个类别的概率。Among them, the input of the pre-trained model is , the output is a pair/> The image recognition result is denoted as /> ,/> represents the i-th category of the true label,/> is the probability of the i-th category in the image recognition results output by the pre-trained model.

进一步地，对损失函数关于输入的的梯度进行计算，其公式可表示为：Furthermore, the loss function with respect to the input The gradient of is calculated, and the formula can be expressed as:

其中，为/>对应的梯度信息，/>为/>对应的识别结果，/>是/>对应的真实标签。in, For/> The corresponding gradient information, /> For/> The corresponding recognition result, /> Yes/> The corresponding true label.

S104：根据所述梯度信息对应梯度方向的反向梯度方向，生成干扰数据。S104: Generate interference data according to a reverse gradient direction of a gradient direction corresponding to the gradient information.

在本说明书中，服务器根据梯度信息对应梯度方向的反向梯度方向，生成干扰数据，对输入的/>生成扰动，可以用公式表示为：In this specification, the server is based on The gradient information corresponds to the reverse gradient direction of the gradient direction, generating interference data for the input / > Generate disturbance, which can be expressed as:

其中，为生成的干扰数据，/>为扰动系数，/>越大，扰动程度越高，/>为符号函数，通过计算损失函数关于输入/>的梯度信息，使用/>符号函数将其转换为反向梯度的方向，可以生成一个指向损失增加方向的干扰数据，从而达到对抗攻击的目的。in, is the generated interference data, /> is the disturbance coefficient, /> The larger the value, the higher the disturbance level. is a symbolic function, by calculating the loss function with respect to the input/> Gradient information, use /> The sign function converts it into the direction of the reverse gradient, which can generate interference data pointing in the direction of increasing loss, thereby achieving the purpose of adversarial attack.

S105：将所述干扰数据加入到所述第一图像中，得到第二图像。S105: Add the interference data to the first image to obtain a second image.

服务器将生成的干扰数据加入到第一图像中，得到第二图像，可以用公式表示为：The server adds the generated interference data to the first image to obtain the second image, which can be expressed as:

其中，为输入到预训练模型中的x添加扰动后的图像数据，也就是说，为第一图像添加干扰数据后得到的第二图像。in, Add the perturbed image data to x input into the pre-trained model, that is, The second image is obtained by adding interference data to the first image.

S106：将所述第二图像输入到所述预训练模型中，以通过所述预训练模型中的权重网络层，确定所述预训练模型中设置的各子识别网络对应的权重，以及通过每个子识别网络，分别对所述第二图像进行识别，得到各识别结果，并根据确定出的所述各子识别网络对应的权重，对所述各识别结果进行加权，得到最终识别结果。S106: Input the second image into the pre-trained model to determine the weights corresponding to each sub-recognition network set in the pre-trained model through the weight network layer in the pre-trained model, and recognize the second image through each sub-recognition network to obtain each recognition result, and weight the each recognition result according to the determined weights corresponding to each sub-recognition network to obtain the final recognition result.

预训练模型中包含有权重网络层和各子识别网络，其中，各子识别网络中包含有第一子识别网络以及第二子识别网络，第一子识别网络用于通过学习到的针对第一图像进行识别的识别规则，对输入到第一子识别网络中的图像进行识别，第二子识别网络用于通过学习到的针对第二图像进行识别的识别规则，对输入到第二子识别网络中的图像进行识别。The pre-trained model includes a weighted network layer and sub-recognition networks, wherein each sub-recognition network includes a first sub-recognition network and a second sub-recognition network. The first sub-recognition network is used to recognize the image input into the first sub-recognition network by using the learned recognition rules for recognizing the first image, and the second sub-recognition network is used to recognize the image input into the second sub-recognition network by using the learned recognition rules for recognizing the second image.

服务器将第二图像输入到预训练模型中，此时，第二图像通过预训练模型中的权重网络层，得到预训练模型中设置的各子识别网络对应的权重。并将第二图像分别输入到预训练模型中的各子识别网络中，以通过每个子识别网络，分别对第二图像进行识别，得到各识别结果，并根据确定出的各子识别网络对应的权重，对各识别结果进行加权，得到最终识别结果。The server inputs the second image into the pre-trained model. At this time, the second image passes through the weight network layer in the pre-trained model to obtain the weights corresponding to each sub-recognition network set in the pre-trained model. The second image is input into each sub-recognition network in the pre-trained model respectively, so that each sub-recognition network recognizes the second image respectively to obtain each recognition result, and each recognition result is weighted according to the weights corresponding to each sub-recognition network determined to obtain the final recognition result.

需要指出的是，对于预训练模型中的第二子识别网络，可以是完全初始化的子识别网络，后续通过在S107中提到的模型训练方式进行训练得到的。也可以是，服务器将第二图像输入到预训练模型的第二子识别网络中，对第二子识别网络进行训练，以使第二子识别网络通过学习到的针对第二图像进行识别的识别规则，可以对输入到第二子识别网络中的图像进行识别，再将训练好的第二子识别网络部署到预训练模型中，通过后续在S107中提到的模型训练方式进行进一步训练。It should be noted that the second sub-recognition network in the pre-trained model may be a fully initialized sub-recognition network, which is subsequently trained by the model training method mentioned in S107. Alternatively, the server inputs the second image into the second sub-recognition network of the pre-trained model, trains the second sub-recognition network, so that the second sub-recognition network can recognize the image input into the second sub-recognition network through the learned recognition rules for recognizing the second image, and then deploys the trained second sub-recognition network into the pre-trained model, and further trains it by the model training method mentioned in S107.

在本说明书中，当服务器使用第一图像对第一子识别网络进行训练以及使用第二图像对第二子识别网络进行训练时，也可以在第一图像中添加少量第二图像，同样的，在第二图像中添加少量第一图像，这样可以增强每个子识别网络自身的泛化能力，减少过拟合现象。In this specification, when the server uses the first image to train the first sub-recognition network and uses the second image to train the second sub-recognition network, a small amount of the second image can also be added to the first image. Similarly, a small amount of the first image can be added to the second image. This can enhance the generalization ability of each sub-recognition network itself and reduce overfitting.

S107：以最小化所述最终识别结果与所述实际标签之间的偏差为优化目标，对所述预训练模型进行训练。S107: The pre-training model is trained with the optimization goal of minimizing the deviation between the final recognition result and the actual label.

在本说明书中，服务器将第二图像输入到预训练模型中，并对预训练模型进行训练时，提供有两种训练方式，一种方式是以最小化最终识别结果与实际标签之间的偏差为优化目标，对预训练模型进行训练，并更新预训练模型中权重网络层以及第一子识别网络和第二子识别网络中的参数，即，这种方式着重体现的在训练过程中，所有网络层的参数均要进行调整；In this specification, when the server inputs the second image into the pre-trained model and trains the pre-trained model, two training methods are provided. One method is to train the pre-trained model with minimizing the deviation between the final recognition result and the actual label as the optimization goal, and update the parameters of the weight network layer and the first sub-recognition network and the second sub-recognition network in the pre-trained model, that is, this method emphasizes that during the training process, the parameters of all network layers must be adjusted;

另一种方式是固定预训练模型中第一子识别网络的网络参数，并以最小化最终识别结果与实际标签之间的偏差为优化目标，对第二子识别网络中的网络参数以及权重网络层中的网络参数进行调整，即，对于已经训练过的第一子识别网络的网络参数，无需调整，而只调整第二子识别网络以及权重网络层中的参数，其中，预训练模型中的第一子识别网络已经提前学习到针对第一图像的识别规则，也就是说，服务器提前使用第一图像对第一子识别网络进行单独训练，根据第一子识别网络的识别结果，对其网络参数进行调整，使其具备针对第一图像的识别能力，所以在预训练模型的训练过程中，可以固定第一子识别网络的网络参数，只对第二子识别网络中的网络参数以及权重网络层中的网络参数进行调整。Another way is to fix the network parameters of the first sub-recognition network in the pre-trained model, and adjust the network parameters in the second sub-recognition network and the network parameters in the weight network layer with the optimization goal of minimizing the deviation between the final recognition result and the actual label. That is, the network parameters of the trained first sub-recognition network do not need to be adjusted, and only the parameters in the second sub-recognition network and the weight network layer are adjusted. The first sub-recognition network in the pre-trained model has already learned the recognition rules for the first image in advance, that is, the server uses the first image to train the first sub-recognition network separately in advance, and adjusts its network parameters according to the recognition result of the first sub-recognition network so that it has the recognition ability for the first image. Therefore, during the training process of the pre-trained model, the network parameters of the first sub-recognition network can be fixed, and only the network parameters in the second sub-recognition network and the network parameters in the weight network layer are adjusted.

另外，当服务器监测到预训练模型获取与训练预训练模型所使用的图像数据的类型不符的图像数据时，具体地，在某一段时间中，服务器监测出针对这段时间输入到预训练模型中的图像数据对应的识别结果的准确率显著降低，那么，就可以认为这段时间输入到预训练模型中的图像数据的类型与之前输入到预训练模型中的图像数据的类型不同，所以需要根据这段时间输入到预训练模型中的图像数据训练预训练模型，进一步地，服务器生成若干新的子识别网络，并将新的子识别网络部署到预训练模型中，并根据新的子识别网络，对权重网络层进行维度扩展，得到更新后的预训练模型。In addition, when the server monitors that the pre-trained model obtains image data that does not match the type of image data used to train the pre-trained model, specifically, during a certain period of time, the server monitors that the accuracy of the recognition results corresponding to the image data input into the pre-trained model during this period is significantly reduced, then, it can be considered that the type of image data input into the pre-trained model during this period is different from the type of image data previously input into the pre-trained model, so it is necessary to train the pre-trained model based on the image data input into the pre-trained model during this period. Furthermore, the server generates several new sub-recognition networks, deploys the new sub-recognition networks into the pre-trained model, and according to the new sub-recognition networks, dimensionally expands the weight network layer to obtain an updated pre-trained model.

服务器将获取到与训练预训练模型所使用的图像数据的类型不符的图像数据以及原始的图像数据作为扩充后图像数据输入到更新后的预训练模型中，也就是说，服务器获取到与训练预训练模型所使用的图像数据的类型不符的图像数据后，需要将与训练预训练模型所使用的图像数据的类型不符的图像数据以及原始的图像数据融合形成一个样本集，这个样本集中的图像数据即为扩充后图像数据，将扩充后图像数据输入到更新后的预训练模型中，其中，原始图像数据可以包括第一图像和第二图像。将与训练预训练模型所使用的图像数据的类型不符的图像数据以及原始的图像数据一起输入到更新后的预训练模型中，进而对预训练模型进行训练，这样的方式可以防止预训练模型中新的子识别网络产生过拟合的现象，也可以在一定程度上防止预训练模型中原始的子识别网络在更新网络参数时偏移过大。The server will obtain image data that does not match the type of image data used to train the pre-trained model and the original image data as expanded image data and input them into the updated pre-trained model. That is to say, after the server obtains image data that does not match the type of image data used to train the pre-trained model, it needs to fuse the image data that does not match the type of image data used to train the pre-trained model and the original image data to form a sample set. The image data in this sample set is the expanded image data, and the expanded image data is input into the updated pre-trained model, wherein the original image data may include a first image and a second image. The image data that does not match the type of image data used to train the pre-trained model and the original image data are input into the updated pre-trained model together, and then the pre-trained model is trained. This method can prevent the new sub-recognition network in the pre-trained model from overfitting, and can also prevent the original sub-recognition network in the pre-trained model from shifting too much when updating the network parameters to a certain extent.

进一步地，通过更新后的预训练模型中的权重网络层，确定原始的各子识别网络对应的权重和各新的子识别网络对应的权重，以及通过原始的每个子识别网络和每个新的子识别网络，分别对扩充后图像数据进行识别，得到各识别结果，并根据确定出的原始的各子识别网络对应的权重和各新的子识别网络对应的权重，对各识别结果进行加权，得到扩充后图像数据对应的识别结果。进而服务器以最小化扩充后图像数据对应的识别结果与扩充后图像数据对应的实际标签之间的偏差为优化目标，对更新后的预训练模型进行训练。Furthermore, the weights corresponding to the original sub-recognition networks and the weights corresponding to the new sub-recognition networks are determined through the weight network layer in the updated pre-trained model, and the expanded image data is recognized through each original sub-recognition network and each new sub-recognition network to obtain each recognition result, and each recognition result is weighted according to the determined weights corresponding to the original sub-recognition networks and the weights corresponding to the new sub-recognition networks to obtain the recognition result corresponding to the expanded image data. Then, the server trains the updated pre-trained model with the optimization goal of minimizing the deviation between the recognition result corresponding to the expanded image data and the actual label corresponding to the expanded image data.

这里的预训练模型的训练方式也涉及两种，一种是以最小化扩充后图像数据对应的识别结果与扩充后图像数据对应的实际标签之间的偏差为优化目标，对更新后的预训练模型进行训练，并对预训练模型中的权重网络层的所有网络参数以及各子识别网络的网络参数进行更新，如图2所示。There are also two training methods for the pre-trained model here. One is to minimize the deviation between the recognition result corresponding to the expanded image data and the actual label corresponding to the expanded image data as the optimization goal, train the updated pre-trained model, and update all network parameters of the weight network layer in the pre-trained model and the network parameters of each sub-recognition network, as shown in Figure 2.

图2为本说明书提供的一种预训练模型的训练过程示意图。FIG2 is a schematic diagram of a training process of a pre-training model provided in this specification.

图2中有阴影的部分代表参与训练的部分，也就是说，在本说明书提供的第一种训练方式中，可以对预训练模型中的权重网络层的所有网络参数以及各子识别网络的网络参数进行更新，以得到更加融合的权重网络层和各子识别网络，从而使得预训练模型更加准确。The shaded part in Figure 2 represents the part involved in the training, that is, in the first training method provided in this specification, all network parameters of the weight network layer in the pre-trained model and the network parameters of each sub-recognition network can be updated to obtain a more integrated weight network layer and each sub-recognition network, thereby making the pre-trained model more accurate.

另一种是固定预训练模型中原始的各子识别网络的网络参数以及权重网络层中针对原始的各子识别网络的维度所对应的网络参数，并以最小化扩充后图像数据对应的识别结果与扩充后图像数据对应的实际标签之间的偏差为优化目标，对各新的子识别网络中的网络参数以及权重网络层中针对各新的子识别网络的扩展维度对应的网络参数进行调整，如图3所示。The other is to fix the network parameters of the original sub-recognition networks in the pre-trained model and the network parameters corresponding to the dimensions of the original sub-recognition networks in the weight network layer, and take minimizing the deviation between the recognition results corresponding to the expanded image data and the actual labels corresponding to the expanded image data as the optimization goal, and adjust the network parameters in each new sub-recognition network and the network parameters corresponding to the expanded dimensions of each new sub-recognition network in the weight network layer, as shown in Figure 3.

图3为本说明书提供的一种预训练模型的训练过程示意图。FIG3 is a schematic diagram of a training process of a pre-training model provided in this specification.

图3中有阴影的部分代表参与训练的部分，也就是说，在本说明书提供的第二种训练方式中，参与到训练过程中的网络参数包括各新的子识别网络中的网络参数以及权重网络层中针对各新的子识别网络的扩展维度对应的网络参数，这样的方式可以减小模型的训练压力，只对预训练模型中的一部分参数进行调整，并且减少预训练模型发生灾难性遗忘的问题。The shaded part in Figure 3 represents the part involved in the training, that is, in the second training method provided in this specification, the network parameters involved in the training process include the network parameters in each new sub-recognition network and the network parameters corresponding to the extended dimensions of each new sub-recognition network in the weight network layer. This method can reduce the training pressure of the model, only adjust a part of the parameters in the pre-trained model, and reduce the problem of catastrophic forgetting in the pre-trained model.

在第一种训练方式中，因为是对各子识别网络中的网络参数以及权重网络层中的所有网络参数进行调整，所以为了使模型训练效果更好，服务器可以通过计算两个部分的损失函数，以得到两个损失值，从而根据这两个损失值，得到总损失值。In the first training method, because the network parameters in each sub-recognition network and all network parameters in the weight network layer are adjusted, in order to make the model training effect better, the server can calculate the loss function of the two parts to obtain two loss values, and then obtain the total loss value based on these two loss values.

具体地，针对第N轮次的训练，服务器根据在第N轮次训练中得到的扩充后图像数据对应的识别结果与扩充后图像数据对应的实际标签之间的偏差，得到第一损失值，具体可表示为：Specifically, for the Nth round of training, the server obtains a first loss value according to the deviation between the recognition result corresponding to the expanded image data obtained in the Nth round of training and the actual label corresponding to the expanded image data, which can be specifically expressed as:

其中，为第一损失值，n表示子识别网络的数量，/>表示权重网络层针对第i个子识别网络输出的权重值，其中，/>是经过/>函数处理后的值，从而保证/>。/>函数处理过程可具体表示为：in, is the first loss value, n represents the number of sub-recognition networks, /> Represents the weight value output by the weight network layer for the i-th sub-recognition network, where,/> It is through/> The value after the function is processed, thus ensuring/> . /> The function processing process can be specifically expressed as:

进一步地，由于预训练模型中原始的子识别网络可以是已经训练过的子识别网络，为防止第N轮训练出的预训练模型与训练前的预训练模型中的网络参数偏移过远，造成灾难性遗忘的问题，进而计算第二损失值。Furthermore, since the original sub-recognition network in the pre-trained model may be a sub-recognition network that has been trained, in order to prevent the pre-trained model trained in the Nth round from deviating too far from the network parameters in the pre-trained model before training, causing catastrophic forgetting, the second loss value is calculated.

具体地，针对每个原始的子识别网络，服务器根据训练前预训练模型中原始的该子识别网络针对扩充后图像数据所得到的识别结果，与经过第N-1轮次训练后预训练模型中原始的该子识别网络针对扩充后图像数据所得到的识别结果之间的偏差，确定原始的该子识别网络所对应的第二损失值，第二损失值具体可表示为：Specifically, for each original sub-recognition network, the server determines the second loss value corresponding to the original sub-recognition network according to the deviation between the recognition result obtained by the original sub-recognition network in the pre-training model for the expanded image data before training and the recognition result obtained by the original sub-recognition network in the pre-training model after the N-1th round of training for the expanded image data. The second loss value can be specifically expressed as:

其中，表示训练前预训练模型中各原始的子识别网络针对扩充后图像数据所得到的识别结果，/>表示经过第N-1轮次训练后预训练模型中各原始的子识别网络针对扩充后图像数据所得到的识别结果。具体地，JS散度的具体计算公式为：in, It indicates the recognition results obtained by each original sub-recognition network in the pre-training model for the expanded image data. It represents the recognition results obtained by each original sub-recognition network in the pre-training model for the expanded image data after the N-1th round of training. Specifically, the specific calculation formula of JS divergence is:

其中，M表示P，Q的平均分布，具体计算公式为：Among them, M represents the average distribution of P and Q, and the specific calculation formula is:

而表示KL散度，其计算公式为：and Represents KL divergence, which is calculated as:

进而，服务器可以通过上述公式得到第二损失值，再根据第一损失值和第二损失值，得到总损失值，具体可以表示为：Furthermore, the server can obtain the second loss value through the above formula, and then obtain the total loss value according to the first loss value and the second loss value, which can be specifically expressed as:

其中，表示调质因子，用于控制子识别网络的遗忘比例。服务器以最小化总损失值为优化目标，对预训练模型进行第N轮次的训练。in, Represents the tempering factor, which is used to control the forgetting ratio of the sub-recognition network. The server trains the pre-trained model for the Nth round with the optimization goal of minimizing the total loss value.

需要说明的是，服务器监测到获取到与训练预训练模型所使用的图像数据的类型不符的图像数据时，生成若干新的子识别网络，然而，当生成的新的子识别网络较多时，此时将与训练预训练模型所使用的图像数据的类型不符的图像数据以及原始的图像数据作为扩充后图像数据分别输入到所有子识别网络中会造成预训练模型计算量增大的情况。因此，在子识别网络的数量较多时，针对未经过函数处理的各子识别网络的权重进行/>采样，如果该子识别网络的权重在前k个内，则保留，否则，将该子识别网络的权重直接置为负无穷。It should be noted that when the server detects that the image data obtained does not match the type of image data used to train the pre-trained model, it generates several new sub-recognition networks. However, when a large number of new sub-recognition networks are generated, the image data that does not match the type of image data used to train the pre-trained model and the original image data are input into all sub-recognition networks as expanded image data, which will increase the amount of calculation of the pre-trained model. Therefore, when the number of sub-recognition networks is large, The weights of each sub-recognition network processed by the function are carried out/> Sampling, if the weight of the sub-recognition network is within the first k, it is retained, otherwise, the weight of the sub-recognition network is directly set to negative infinity.

上述公式还需要经过函数处理，对于非/>的部分，由于该子识别网络对应的权重是负无穷，这样在经过/>函数处理之后该子识别网络对应的权重将置为0，可以有效减少计算量。例如，在/>采样中的/>时，只保留子识别网络的权重值在前3内的子识别网络，服务器将扩充后图像数据分别输入到保留的3个子识别网络中，并根据这3个子识别网络对应的权重，对3个识别结果进行加权，以得到扩充后图像数据对应的识别结果。The above formula also needs to be Function processing, for non/> Part, because the weight corresponding to the sub-recognition network is negative infinity, so after /> After the function is processed, the weight corresponding to the sub-recognition network will be set to 0, which can effectively reduce the amount of calculation. For example, in /> Sampling /> , only the sub-recognition networks whose weight values are in the top 3 are retained. The server inputs the expanded image data into the three retained sub-recognition networks respectively, and weights the three recognition results according to the weights corresponding to the three sub-recognition networks to obtain the recognition result corresponding to the expanded image data.

在本说明书中，通过第一图像对应的识别结果以及第一图像对应的实际标签，得到第二图像，再将第二图像输入到预训练模型中，并对预训练模型中的网络参数进行调整更新，以得到更加准确的识别结果。In this specification, a second image is obtained through the recognition result corresponding to the first image and the actual label corresponding to the first image, and then the second image is input into a pre-trained model, and the network parameters in the pre-trained model are adjusted and updated to obtain a more accurate recognition result.

当服务器监测到输入预训练模型中的图像数据的类型与预训练模型所使用的图像数据的类型不符时，可以生成若干新的子识别网络，以便增加预训练模型对环境的适应性，进而针对多种图像数据，也能有很好的识别结果。When the server detects that the type of image data input into the pre-trained model does not match the type of image data used by the pre-trained model, it can generate several new sub-recognition networks to increase the adaptability of the pre-trained model to the environment, thereby achieving good recognition results for a variety of image data.

并且在预训练模型的训练过程中，还提供两种训练方式，一种是对预训练模型中的权重网络层和各子识别网络的网络参数都进行调整，这种方式能够实现预训练模型中的权重网络层和各子识别网络的深度融合，以得到准确性较高的预训练模型。In addition, during the training process of the pre-trained model, two training methods are also provided. One is to adjust the network parameters of the weight network layer and each sub-recognition network in the pre-trained model. This method can achieve a deep fusion of the weight network layer and each sub-recognition network in the pre-trained model to obtain a pre-trained model with higher accuracy.

另一种是固定预训练模型中原始的各子识别网络的网络参数以及权重网络层中针对原始的各子识别网络的维度所对应的网络参数，只对各新的子识别网络中的网络参数以及权重网络层中针对各新的子识别网络的扩展维度对应的网络参数进行调整，这样的方式可以保留之前的训练效果，防止发生灾难性遗忘。The other is to fix the network parameters of the original sub-recognition networks in the pre-trained model and the network parameters corresponding to the dimensions of the original sub-recognition networks in the weight network layer, and only adjust the network parameters in each new sub-recognition network and the network parameters corresponding to the extended dimensions of each new sub-recognition network in the weight network layer. This method can retain the previous training effects and prevent catastrophic forgetting.

这两种方式适应的场景不同，在实际应用中需要根据不同的情况对预训练模型进行训练，也可以根据输入的图像数据，选择正确的训练方式，以得到更加准确的预训练模型。These two methods are suitable for different scenarios. In practical applications, the pre-trained model needs to be trained according to different situations. You can also choose the correct training method according to the input image data to obtain a more accurate pre-trained model.

本说明书提供的方法可以应用于多种图像识别场景中，尤其是在风险识别领域对图像进行识别，例如，在人脸识别领域，可以通过采集到的人脸图像来判断用户是否存在业务风险，本说明书提供的方法即可应用在上述领域中。通过本说明书可以看出，随着对抗性样本种类的增加，自动添加子识别网络，并通过上述的训练方式对各子识别网络和权重网络层的网络参数进行调整，使其能够得到准确的识别结果。因此，虽然随着时间的推移风险种类逐渐增加，本说明书提供的方法可以根据图像数据的样本特性对预训练模型进行训练，使得预训练模型可以学习出针对新风险的子识别网络，实现了模型动态防御的过程。The method provided in this specification can be applied to a variety of image recognition scenarios, especially in the field of risk identification. For example, in the field of face recognition, the collected face images can be used to determine whether the user has business risks. The method provided in this specification can be applied in the above-mentioned fields. It can be seen from this specification that as the types of adversarial samples increase, sub-recognition networks are automatically added, and the network parameters of each sub-recognition network and weight network layer are adjusted through the above-mentioned training method, so that accurate recognition results can be obtained. Therefore, although the types of risks gradually increase over time, the method provided in this specification can train the pre-trained model according to the sample characteristics of the image data, so that the pre-trained model can learn sub-recognition networks for new risks, realizing the process of dynamic defense of the model.

为了进一步描述本说明书中提供的方法，这里提供一种预训练模型的网络结构，其中包含三个子识别网络：A、B和C，A或B可以是通过第一图像得到的第一子识别网络，C可以是通过第二图像得到的第二子识别网络。其中，A主要是用于识别普通图像数据的，B主要是用于识别困难图像数据的，C主要是用于识别对抗性图像数据的。A、B和C都能够对三种不同的图像数据进行识别，并得到相应的结果，在预训练模型的预训练过程中，分别使用普通图像数据、困难图像数据和对抗性图像数据对A、B和C进行单独训练，将训练后的A、B和C部署到预训练模型中，并在预训练模型中设置权重网络层，再对预训练模型进行训练，使得权重网络层和三个子识别网络能够学习到彼此的学习特性，以使预训练模型能够根据不同图像数据的输入，得到更加合理的识别结果。同时在权重网络层的作用下，对三个子识别网络的识别结果进行融合，使得得到的最终识别结果更加准确，实现了动态防御的过程。In order to further describe the method provided in this specification, a network structure of a pre-trained model is provided here, which includes three sub-recognition networks: A, B and C. A or B can be a first sub-recognition network obtained through a first image, and C can be a second sub-recognition network obtained through a second image. Among them, A is mainly used to identify ordinary image data, B is mainly used to identify difficult image data, and C is mainly used to identify adversarial image data. A, B and C can all identify three different image data and obtain corresponding results. In the pre-training process of the pre-trained model, ordinary image data, difficult image data and adversarial image data are used to train A, B and C separately, and the trained A, B and C are deployed to the pre-trained model, and a weight network layer is set in the pre-trained model, and then the pre-trained model is trained so that the weight network layer and the three sub-recognition networks can learn each other's learning characteristics, so that the pre-trained model can obtain more reasonable recognition results according to the input of different image data. At the same time, under the action of the weight network layer, the recognition results of the three sub-recognition networks are fused, so that the final recognition result obtained is more accurate, and the dynamic defense process is realized.

以上为本说明书的一个或多个实施基于拟态结构动态防御的模型训练方法，基于同样的思路，本说明书还提供了相应的基于拟态结构动态防御的模型训练装置，如图4所示。The above is one or more implementations of the model training method based on dynamic defense of mimetic structure in this specification. Based on the same idea, this specification also provides a corresponding model training device based on dynamic defense of mimetic structure, as shown in FIG4 .

图4为本说明书提供的一种基于拟态结构动态防御的模型训练装置的示意图，包括：FIG4 is a schematic diagram of a model training device based on dynamic defense of mimicry structure provided in this specification, including:

获取模块401，用于获取预训练模型；An acquisition module 401 is used to acquire a pre-trained model;

生成模块402，用于将训练所述预训练模型所使用的第一图像输入到所述预训练模型中，得到所述第一图像对应的识别结果；根据所述第一图像对应的识别结果以及所述第一图像对应的实际标签，确定所述第一图像对应的梯度信息；根据所述梯度信息对应梯度方向的反向梯度方向，生成干扰数据；将所述干扰数据加入到所述第一图像中，得到第二图像；A generating module 402 is used to input the first image used for training the pre-trained model into the pre-trained model to obtain a recognition result corresponding to the first image; determine the gradient information corresponding to the first image according to the recognition result corresponding to the first image and the actual label corresponding to the first image; generate interference data according to the reverse gradient direction of the gradient direction corresponding to the gradient information; and add the interference data to the first image to obtain a second image;

加权模块403，用于将所述第二图像输入到所述预训练模型中，以通过所述预训练模型中的权重网络层，确定所述预训练模型中设置的各子识别网络对应的权重，以及通过每个子识别网络，分别对所述第二图像进行识别，得到各识别结果，并根据确定出的所述各子识别网络对应的权重，对所述各识别结果进行加权，得到最终识别结果；A weighting module 403 is used to input the second image into the pre-trained model, determine the weights corresponding to the sub-recognition networks set in the pre-trained model through the weight network layer in the pre-trained model, and recognize the second image through each sub-recognition network to obtain each recognition result, and weight the recognition results according to the determined weights corresponding to the sub-recognition networks to obtain a final recognition result;

训练模块404，用于以最小化所述最终识别结果与所述实际标签之间的偏差为优化目标，对所述预训练模型进行训练。The training module 404 is used to train the pre-training model with minimizing the deviation between the final recognition result and the actual label as the optimization goal.

所述训练模块404具体用于，固定所述预训练模型中所述第一子识别网络的网络参数，并以最小化所述最终识别结果与所述实际标签之间的偏差为优化目标，对所述第二子识别网络中的网络参数以及所述权重网络层中的网络参数进行调整。The training module 404 is specifically used to fix the network parameters of the first sub-recognition network in the pre-trained model, and adjust the network parameters in the second sub-recognition network and the network parameters in the weight network layer with the optimization goal of minimizing the deviation between the final recognition result and the actual label.

可选地，所述训练模块404还用于，当监测到获取与训练所述预训练模型所使用的图像数据的类型不符的图像数据时，生成若干新的子识别网络，并将所述新的子识别网络部署到所述预训练模型中，并根据所述新的子识别网络，对所述权重网络层进行维度扩展，得到更新后的预训练模型；将获取到与训练所述预训练模型所使用的图像数据的类型不符的图像数据以及原始的图像数据作为扩充后图像数据输入到所述更新后的预训练模型中，以通过所述更新后的预训练模型中的权重网络层，确定原始的各子识别网络对应的权重和各新的子识别网络对应的权重，以及通过原始的每个子识别网络和每个新的子识别网络，分别对所述扩充后图像数据进行识别，得到各识别结果，并根据确定出的原始的各子识别网络对应的权重和所述各新的子识别网络对应的权重，对所述各识别结果进行加权，得到所述扩充后图像数据对应的识别结果；以最小化所述扩充后图像数据对应的识别结果与所述扩充后图像数据对应的实际标签之间的偏差为优化目标，对所述更新后的预训练模型进行训练。Optionally, the training module 404 is also used to, when it is monitored that image data of a type inconsistent with the image data used for training the pre-trained model is obtained, generate several new sub-recognition networks, deploy the new sub-recognition networks into the pre-trained model, and according to the new sub-recognition networks, dimensionally expand the weight network layer to obtain an updated pre-trained model; input the image data of a type inconsistent with the image data used for training the pre-trained model and the original image data as expanded image data into the updated pre-trained model, so as to pass through the weight network layer in the updated pre-trained model, Determine the weights corresponding to the original sub-recognition networks and the weights corresponding to the new sub-recognition networks, and respectively recognize the expanded image data through each original sub-recognition network and each new sub-recognition network to obtain each recognition result, and weight the recognition results according to the determined weights corresponding to the original sub-recognition networks and the weights corresponding to the new sub-recognition networks to obtain the recognition result corresponding to the expanded image data; train the updated pre-trained model with the optimization goal of minimizing the deviation between the recognition result corresponding to the expanded image data and the actual label corresponding to the expanded image data.

可选地，所述训练模块404具体用于，固定所述预训练模型中原始的各子识别网络的网络参数以及所述权重网络层中针对原始的各子识别网络的维度所对应的网络参数；以最小化所述扩充后图像数据对应的识别结果与所述扩充后图像数据对应的实际标签之间的偏差为优化目标，对所述各新的子识别网络中的网络参数以及所述权重网络层中针对所述各新的子识别网络的扩展维度对应的网络参数进行调整。Optionally, the training module 404 is specifically used to fix the network parameters of the original sub-recognition networks in the pre-trained model and the network parameters corresponding to the dimensions of the original sub-recognition networks in the weight network layer; and adjust the network parameters in the new sub-recognition networks and the network parameters corresponding to the expanded dimensions of the new sub-recognition networks in the weight network layer with the optimization goal of minimizing the deviation between the recognition result corresponding to the expanded image data and the actual label corresponding to the expanded image data.

可选地，所述训练模块404具体用于，针对第N轮次的训练，根据在第N轮次训练中得到的所述扩充后图像数据对应的识别结果与所述扩充后图像数据对应的实际标签之间的偏差，得到第一损失值；针对每个原始的子识别网络，根据训练前所述预训练模型中原始的该子识别网络针对所述扩充后图像数据所得到的识别结果，与经过第N-1轮次训练后所述预训练模型中原始的该子识别网络针对所述扩充后图像数据所得到的识别结果之间的偏差，确定原始的该子识别网络所对应的第二损失值；根据所述第一损失值和所述第二损失值，得到总损失值，以通过最小化所述总损失值为优化目标，对所述预训练模型进行第N轮次的训练。Optionally, the training module 404 is specifically used to obtain a first loss value for the Nth round of training based on the deviation between the recognition result corresponding to the expanded image data obtained in the Nth round of training and the actual label corresponding to the expanded image data; for each original sub-recognition network, determine a second loss value corresponding to the original sub-recognition network based on the deviation between the recognition result obtained by the original sub-recognition network in the pre-training model for the expanded image data before training and the recognition result obtained by the original sub-recognition network in the pre-training model for the expanded image data after the N-1th round of training; obtain a total loss value based on the first loss value and the second loss value, and perform the Nth round of training on the pre-trained model by minimizing the total loss value as the optimization target.

本说明书还提供了一种计算机可读存储介质，该存储介质存储有计算机程序，计算机程序可用于执行上述图1提供的一种基于拟态结构动态防御的模型训练方法。This specification also provides a computer-readable storage medium, which stores a computer program. The computer program can be used to execute a model training method based on dynamic defense of mimicry structure provided in FIG. 1 above.

本说明书还提供了图5所示的一种对应于图1的电子设备的示意结构图。如图5所述，在硬件层面，该电子设备包括处理器、内部总线、网络接口、内存以及非易失性存储器，当然还可能包括其他业务所需要的硬件。处理器从非易失性存储器中读取对应的计算机程序到内存中然后运行，以实现上述图1所述的基于拟态结构动态防御的模型训练方法。当然，除了软件实现方式之外，本说明书并不排除其他实现方式，比如逻辑器件抑或软硬件结合的方式等等，也就是说以下处理流程的执行主体并不限定于各个逻辑单元，也可以是硬件或逻辑器件。This specification also provides a schematic structural diagram of an electronic device corresponding to Figure 1, as shown in Figure 5. As shown in Figure 5, at the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile memory, and of course may also include hardware required for other services. The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs it to implement the model training method based on dynamic defense of mimetic structure described in Figure 1 above. Of course, in addition to software implementation methods, this specification does not exclude other implementation methods, such as logic devices or a combination of software and hardware, etc., that is to say, the execution subject of the following processing flow is not limited to each logic unit, but can also be hardware or logic devices.

对于一个技术的改进可以很明显地区分是硬件上的改进（例如，对二极管、晶体管、开关等电路结构的改进）还是软件上的改进（对于方法流程的改进）。然而，随着技术的发展，当今的很多方法流程的改进已经可以视为硬件电路结构的直接改进。设计人员几乎都通过将改进的方法流程编程到硬件电路中来得到相应的硬件电路结构。因此，不能说一个方法流程的改进就不能用硬件实体模块来实现。例如，可编程逻辑器件（ProgrammableLogic Device, PLD）（例如现场可编程门阵列（Field Programmable Gate Array，FPGA））就是这样一种集成电路，其逻辑功能由用户对器件编程来确定。由设计人员自行编程来把一个数字系统“集成”在一片PLD上，而不需要请芯片制造厂商来设计和制作专用的集成电路芯片。而且，如今，取代手工地制作集成电路芯片，这种编程也多半改用“逻辑编译器（logic compiler）”软件来实现，它与程序开发撰写时所用的软件编译器相类似，而要编译之前的原始代码也得用特定的编程语言来撰写，此称之为硬件描述语言（HardwareDescription Language，HDL），而HDL也并非仅有一种，而是有许多种，如ABEL（AdvancedBoolean Expression Language）、AHDL（Altera Hardware Description Language）、Confluence、CUPL（Cornell University Programming Language）、HDCal、JHDL（JavaHardware Description Language）、Lava、Lola、MyHDL、PALASM、RHDL（Ruby HardwareDescription Language）等，目前最普遍使用的是VHDL（Very-High-Speed IntegratedCircuit Hardware Description Language）与Verilog。本领域技术人员也应该清楚，只需要将方法流程用上述几种硬件描述语言稍作逻辑编程并编程到集成电路中，就可以很容易得到实现该逻辑方法流程的硬件电路。For the improvement of a technology, it can be clearly distinguished whether it is a hardware improvement (for example, improvement of the circuit structure of diodes, transistors, switches, etc.) or a software improvement (improvement of the method flow). However, with the development of technology, many improvements of the method flow today can be regarded as direct improvements of the hardware circuit structure. Designers almost always obtain the corresponding hardware circuit structure by programming the improved method flow into the hardware circuit. Therefore, it cannot be said that the improvement of a method flow cannot be implemented with a hardware entity module. For example, a programmable logic device (PLD) (such as a field programmable gate array (FPGA)) is such an integrated circuit whose logical function is determined by the user's programming of the device. Designers can "integrate" a digital system on a PLD by programming themselves, without having to ask chip manufacturers to design and make dedicated integrated circuit chips. Moreover, nowadays, instead of manually making integrated circuit chips, this kind of programming is mostly implemented by "logic compiler" software, which is similar to the software compiler used when developing and writing programs, and the original code before compilation must also be written in a specific programming language, which is called hardware description language (HDL). There is not only one kind of HDL, but many kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), Confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, RHDL (Ruby Hardware Description Language), etc. The most commonly used ones are VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog. Those skilled in the art should also know that it is only necessary to program the method flow slightly in the above-mentioned hardware description languages and program it into the integrated circuit, and then it is easy to obtain the hardware circuit that implements the logic method flow.

控制器可以按任何适当的方式实现，例如，控制器可以采取例如微处理器或处理器以及存储可由该（微）处理器执行的计算机可读程序代码（例如软件或固件）的计算机可读介质、逻辑门、开关、专用集成电路（Application Specific Integrated Circuit，ASIC）、可编程逻辑控制器和嵌入微控制器的形式，控制器的例子包括但不限于以下微控制器：ARC 625D、Atmel AT91SAM、Microchip PIC18F26K20 以及Silicone Labs C8051F320，存储器控制器还可以被实现为存储器的控制逻辑的一部分。本领域技术人员也知道，除了以纯计算机可读程序代码方式实现控制器以外，完全可以通过将方法步骤进行逻辑编程来使得控制器以逻辑门、开关、专用集成电路、可编程逻辑控制器和嵌入微控制器等的形式来实现相同功能。因此这种控制器可以被认为是一种硬件部件，而对其内包括的用于实现各种功能的装置也可以视为硬件部件内的结构。或者甚至，可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又可以是硬件部件内的结构。The controller may be implemented in any suitable manner, for example, the controller may take the form of a microprocessor or processor and a computer-readable medium storing a computer-readable program code (e.g., software or firmware) executable by the (micro)processor, a logic gate, a switch, an application-specific integrated circuit (ASIC), a programmable logic controller, and an embedded microcontroller. Examples of controllers include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320. The memory controller may also be implemented as part of the control logic of the memory. It is also known to those skilled in the art that, in addition to implementing the controller in a purely computer-readable program code manner, the controller may be implemented in the form of a logic gate, a switch, an application-specific integrated circuit, a programmable logic controller, and an embedded microcontroller by logically programming the method steps. Therefore, such a controller may be considered as a hardware component, and the devices for implementing various functions included therein may also be considered as structures within the hardware component. Or even, the devices for implementing various functions may be considered as both software modules for implementing the method and structures within the hardware component.

上述实施例阐明的系统、装置、模块或单元，具体可以由计算机芯片或实体实现，或者由具有某种功能的产品来实现。一种典型的实现设备为计算机。具体的，计算机例如可以为个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。The systems, devices, modules or units described in the above embodiments may be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer. Specifically, the computer may be, for example, a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

为了描述的方便，描述以上装置时以功能分为各种单元分别描述。当然，在实施本说明书时可以把各单元的功能在同一个或多个软件和/或硬件中实现。For the convenience of description, the above device is described in various units according to their functions. Of course, when implementing this specification, the functions of each unit can be implemented in the same or multiple software and/or hardware.

本领域内的技术人员应明白，本说明书的实施例可提供为方法、系统、或计算机程序产品。因此，本说明书可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本说明书可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质（包括但不限于磁盘存储器、CD-ROM、光学存储器等）上实施的计算机程序产品的形式。Those skilled in the art will appreciate that the embodiments of this specification may be provided as methods, systems, or computer program products. Therefore, this specification may take the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this specification may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

本说明书是参照根据本说明书实施例的方法、设备（系统）、和计算机程序产品的流程图和／或方框图来描述的。应理解可由计算机程序指令实现流程图和／或方框图中的每一流程和／或方框、以及流程图和／或方框图中的流程和／或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和／或方框图一个方框或多个方框中指定的功能的装置。This specification is described with reference to the flowcharts and/or block diagrams of the methods, devices (systems), and computer program products according to the embodiments of this specification. It should be understood that each process and/or box in the flowchart and/or block diagram, as well as the combination of the processes and/or boxes in the flowchart and/or block diagram, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing device to produce a machine, so that the instructions executed by the processor of the computer or other programmable data processing device produce a device for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和／或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to operate in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和／或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions may also be loaded onto a computer or other programmable data processing device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

在一个典型的配置中，计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPU), input/output interfaces, network interfaces, and memory.

内存可能包括计算机可读介质中的非永久性存储器，随机存取存储器(RAM)和/或非易失性内存等形式，如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。The memory may include non-permanent storage in a computer-readable medium, random access memory (RAM) and/or non-volatile memory in the form of read-only memory (ROM) or flash RAM. The memory is an example of a computer-readable medium.

计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括，但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带，磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质，可用于存储可以被计算设备访问的信息。按照本文中的界定，计算机可读介质不包括暂存电脑可读媒体(transitory media)，如调制的数据信号和载波。Computer readable media include permanent and non-permanent, removable and non-removable media that can be implemented by any method or technology to store information. Information can be computer readable instructions, data structures, program modules or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disk read-only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include temporary computer readable media (transitory media), such as modulated data signals and carrier waves.

还需要说明的是，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "include", "comprises" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, commodity or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, commodity or device. In the absence of more restrictions, the elements defined by the sentence "comprises a ..." do not exclude the existence of other identical elements in the process, method, commodity or device including the elements.

本领域技术人员应明白，本说明书的实施例可提供为方法、系统或计算机程序产品。因此，本说明书可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且，本说明书可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质（包括但不限于磁盘存储器、CD-ROM、光学存储器等）上实施的计算机程序产品的形式。Those skilled in the art will appreciate that the embodiments of this specification may be provided as methods, systems or computer program products. Therefore, this specification may take the form of a complete hardware embodiment, a complete software embodiment or an embodiment combining software and hardware. Moreover, this specification may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

本说明书可以在由计算机执行的计算机可执行指令的一般上下文中描述，例如程序模块。一般地，程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本说明书，在这些分布式计算环境中，由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中，程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。This specification may be described in the general context of computer-executable instructions executed by a computer, such as program modules. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types. This specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices connected through a communication network. In a distributed computing environment, program modules may be located in local and remote computer storage media, including storage devices.

本说明书中的各个实施例均采用递进的方式描述，各个实施例之间相同相似的部分互相参见即可，每个实施例重点说明的都是与其他实施例的不同之处。尤其，对于系统实施例而言，由于其基本相似于方法实施例，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。Each embodiment in this specification is described in a progressive manner, and the same or similar parts between the embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the partial description of the method embodiment.

以上所述仅为本说明书的实施例而已，并不用于限制本说明书。对于本领域技术人员来说，本说明书可以有各种更改和变化。凡在本说明书的精神和原理之内所作的任何修改、等同替换、改进等，均应包含在本说明书的权利要求范围之内。The above description is only an embodiment of the present specification and is not intended to limit the present specification. For those skilled in the art, the present specification may have various changes and variations. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present specification shall be included in the scope of the claims of the present specification.

Claims

1. A model training method based on dynamic defense of mimicry structure, characterized by comprising:

Get the pre-trained model;

Inputting a first image used to train the pre-trained model into the pre-trained model to obtain a recognition result corresponding to the first image;

Determining gradient information corresponding to the first image according to a recognition result corresponding to the first image and an actual label corresponding to the first image;

Generate interference data according to the reverse gradient direction of the gradient direction corresponding to the gradient information, wherein the direction of the gradient information is converted into the reverse gradient direction by a preset function, and the interference data is generated according to the converted gradient information and a preset perturbation coefficient, wherein the larger the perturbation coefficient, the higher the degree of perturbation to the converted gradient information;

adding the interference data to the first image to obtain a second image;

Inputting the second image into the pre-trained model, determining the weights corresponding to the sub-recognition networks set in the pre-trained model through the weight network layer in the pre-trained model, and respectively recognizing the second image through each sub-recognition network to obtain each recognition result, and weighting each recognition result according to the determined weights corresponding to each sub-recognition network to obtain a final recognition result;

The pre-training model is trained with the optimization goal of minimizing the deviation between the final recognition result and the actual label.

2. The method according to claim 1, wherein each of the sub-recognition networks comprises a first sub-recognition network and a second sub-recognition network, wherein the first sub-recognition network is used to recognize an image input into the first sub-recognition network by using a learned recognition rule for recognizing a first image, and the second sub-recognition network is used to recognize an image input into the second sub-recognition network by using a learned recognition rule for recognizing a second image;

The pre-training model is trained with minimizing the deviation between the final recognition result and the actual label as the optimization goal, specifically including:

The network parameters of the first sub-recognition network in the pre-trained model are fixed, and the network parameters in the second sub-recognition network and the network parameters in the weight network layer are adjusted with the optimization goal of minimizing the deviation between the final recognition result and the actual label.

3. The method according to claim 1, characterized in that the method further comprises:

When it is monitored that image data that does not match the type of image data used to train the pre-trained model is obtained, a number of new sub-recognition networks are generated, and the new sub-recognition networks are deployed to the pre-trained model, and the weight network layer is dimensionally expanded according to the new sub-recognition networks to obtain an updated pre-trained model;

Inputting the acquired image data that does not conform to the type of image data used to train the pre-trained model and the original image data as expanded image data into the updated pre-trained model, so as to determine the weights corresponding to the original sub-recognition networks and the weights corresponding to the new sub-recognition networks through the weight network layer in the updated pre-trained model, and respectively recognize the expanded image data through each original sub-recognition network and each new sub-recognition network to obtain each recognition result, and weighting the recognition results according to the determined weights corresponding to the original sub-recognition networks and the weights corresponding to the new sub-recognition networks to obtain the recognition result corresponding to the expanded image data;

The updated pre-trained model is trained with the optimization goal of minimizing the deviation between the recognition result corresponding to the expanded image data and the actual label corresponding to the expanded image data.

4. The method according to claim 3, characterized in that the updated pre-trained model is trained with minimizing the deviation between the recognition result corresponding to the expanded image data and the actual label corresponding to the expanded image data as the optimization goal, specifically comprising:

Fixing the network parameters of the original sub-recognition networks in the pre-trained model and the network parameters corresponding to the dimensions of the original sub-recognition networks in the weight network layer;

With minimizing the deviation between the recognition result corresponding to the expanded image data and the actual label corresponding to the expanded image data as the optimization goal, the network parameters in each new sub-recognition network and the network parameters corresponding to the expanded dimension of each new sub-recognition network in the weight network layer are adjusted.

5. The method according to claim 3, characterized in that the updated pre-trained model is trained with minimizing the deviation between the recognition result corresponding to the expanded image data and the actual label corresponding to the expanded image data as the optimization goal, specifically comprising:

For the Nth round of training, a first loss value is obtained according to a deviation between a recognition result corresponding to the expanded image data obtained in the Nth round of training and an actual label corresponding to the expanded image data;

For each original sub-recognition network, determine a second loss value corresponding to the original sub-recognition network according to a deviation between a recognition result obtained by the original sub-recognition network in the pre-training model for the expanded image data before training and a recognition result obtained by the original sub-recognition network in the pre-training model for the expanded image data after the N-1th round of training;

A total loss value is obtained according to the first loss value and the second loss value, and the pre-trained model is trained for an Nth round by minimizing the total loss value as an optimization target.

6. A model training device based on dynamic defense of mimicry structure, characterized by comprising:

Acquisition module, used to obtain pre-trained models;

A generation module, used to input the first image used to train the pre-trained model into the pre-trained model to obtain a recognition result corresponding to the first image; determine the gradient information corresponding to the first image according to the recognition result corresponding to the first image and the actual label corresponding to the first image; generate interference data according to the reverse gradient direction of the gradient direction corresponding to the gradient information, wherein the direction of the gradient information is converted into the reverse gradient direction by a preset function, and the interference data is generated according to the converted gradient information and a preset perturbation coefficient, wherein the larger the perturbation coefficient, the higher the degree of perturbation to the converted gradient information; and add the interference data to the first image to obtain a second image;

A weighting module, used for inputting the second image into the pre-trained model, determining the weights corresponding to the sub-recognition networks set in the pre-trained model through the weight network layer in the pre-trained model, and respectively recognizing the second image through each sub-recognition network to obtain each recognition result, and weighting each recognition result according to the determined weights corresponding to each sub-recognition network to obtain a final recognition result;

The training module is used to train the pre-training model with the optimization goal of minimizing the deviation between the final recognition result and the actual label.

7. The device according to claim 6, characterized in that each of the sub-recognition networks includes a first sub-recognition network and a second sub-recognition network, the first sub-recognition network is used to recognize the image input to the first sub-recognition network by using the learned recognition rule for recognizing the first image, and the second sub-recognition network is used to recognize the image input to the second sub-recognition network by using the learned recognition rule for recognizing the second image;

The training module is specifically used to fix the network parameters of the first sub-recognition network in the pre-trained model, and adjust the network parameters in the second sub-recognition network and the network parameters in the weight network layer with the optimization goal of minimizing the deviation between the final recognition result and the actual label.

8. The device as described in claim 6 is characterized in that the training module is also used to, when monitoring the acquisition of image data that does not match the type of image data used to train the pre-trained model, generate a number of new sub-recognition networks, deploy the new sub-recognition networks into the pre-trained model, and according to the new sub-recognition networks, perform dimension expansion on the weight network layer to obtain an updated pre-trained model; input the image data that does not match the type of image data used to train the pre-trained model and the original image data as expanded image data into the updated pre-trained model, so as to obtain the updated pre-trained model through the updated pre-trained model. The weight network layer is used to determine the weights corresponding to the original sub-recognition networks and the weights corresponding to the new sub-recognition networks, and the expanded image data is recognized by each original sub-recognition network and each new sub-recognition network, respectively, to obtain each recognition result, and the recognition results are weighted according to the determined weights corresponding to the original sub-recognition networks and the weights corresponding to the new sub-recognition networks to obtain the recognition result corresponding to the expanded image data; the updated pre-training model is trained with the optimization goal of minimizing the deviation between the recognition result corresponding to the expanded image data and the actual label corresponding to the expanded image data.

9. A computer-readable storage medium, characterized in that the storage medium stores a computer program, and when the computer program is executed by a processor, the method according to any one of claims 1 to 5 is implemented.

10. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the method described in any one of claims 1 to 5 when executing the program.