CN115049108A

CN115049108A - Multitask model training method, multitask prediction method, related device and medium

Info

Publication number: CN115049108A
Application number: CN202210552195.5A
Authority: CN
Inventors: 宿嘉颖
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2022-05-20
Filing date: 2022-05-20
Publication date: 2022-09-13
Anticipated expiration: 2042-05-20
Also published as: CN115049108B

Abstract

The embodiment of the specification discloses a multitask model training method, a multitask prediction method, a related device and a medium, wherein the training method comprises the following steps: the parameter weight of the sample feature in the attention module of the mth sub-model is determined, the parameter in the attention module of the (m + 1) th sub-model is determined according to the parameter weight in the attention module of the mth sub-model, and the multitask model can be trained based on the parameter weight in the attention module of the mth sub-model and the parameter in the attention module of the (m + 1) th sub-model. The multi-task model can enable each task in the multi-task to correspond to one sub-model, and the related information of the previous task is transferred to the next task through the attention module between the adjacent tasks, so that the next task can obtain a more accurate and more relevant prediction result by combining the related information of the previous task.

Description

Multi-task model training method, multi-task prediction method, related device and medium

技术领域technical field

本说明书属于数据处理技术领域，特别的涉及一种多任务模型训练方法、多任务预测方法、相关装置及介质。The present specification belongs to the technical field of data processing, and in particular relates to a multi-task model training method, a multi-task prediction method, a related device and a medium.

背景技术Background technique

一般对于任务进行预测的方式可为将任务对应的数据输入至训练好的模型中得到，该类模型可基于任务数据的学习模式进行训练得到，且在面对复杂的任务时可先进行分解再对多个任务分别进行训练，并通过对该多个任务各自对应的结果进行组合以得到最终预测结果。Generally, the way to predict a task can be obtained by inputting the data corresponding to the task into the trained model. This type of model can be obtained by training based on the learning mode of the task data, and when faced with complex tasks, it can be decomposed first and then Training is performed on multiple tasks respectively, and the final prediction result is obtained by combining the corresponding results of the multiple tasks.

由于多任务之间可具有关联性，传统的预测方式得到的预测结果会损失该关联性信息，因此需要提供预测结果的准确性更高的技术方案。Since there may be correlation between multiple tasks, the prediction result obtained by the traditional prediction method will lose the correlation information, so it is necessary to provide a technical solution with higher accuracy of the prediction result.

发明内容SUMMARY OF THE INVENTION

本说明书实施例提供了一种多任务模型训练方法、多任务预测方法、相关装置及介质，其技术方案如下：The embodiments of this specification provide a multi-task model training method, a multi-task prediction method, a related device and a medium, the technical solutions of which are as follows:

第一方面，本说明书实施例提供了一种多任务模型训练方法，多任务模型包括M个子模型，每个子模型分别对应一个任务，每个子模型分别包括一个注意力模块，方法包括：In the first aspect, the embodiments of this specification provide a multi-task model training method. The multi-task model includes M sub-models, each sub-model corresponds to a task, and each sub-model includes an attention module. The method includes:

确定样本特征在第m个子模型的注意力模块中的参数权重；其中，m为小于M的正整数；Determine the parameter weight of the sample feature in the attention module of the mth sub-model; where m is a positive integer less than M;

根据第m个子模型的注意力模块中的参数权重确定第m+1个子模型的注意力模块中的参数；其中，样本特征包括样本用户特征、样本产品特征以及用户对于第m+1个子模型对应的任务的样本结果；The parameters in the attention module of the m+1 th sub-model are determined according to the parameter weights in the attention module of the m th sub-model; wherein, the sample features include sample user features, sample product features, and the user's correspondence to the m+1 th sub-model. sample results of the task;

基于样本特征、第m个子模型的注意力模块中的参数权重以及第m+1个子模型的注意力模块中的参数对多任务模型进行训练。The multi-task model is trained based on the sample features, the parameter weights in the attention module of the mth submodel, and the parameters in the attention module of the m+1th submodel.

第二方面，本说明书实施例提供了一种多任务预测方法，方法应用于多任务模型，多任务模型包括M个子模型，每个子模型分别对应一个任务，每个子模型分别包括一个注意力模块，方法包括：In the second aspect, the embodiments of this specification provide a multi-task prediction method, the method is applied to a multi-task model, the multi-task model includes M sub-models, each sub-model corresponds to a task, and each sub-model includes an attention module respectively, Methods include:

确定目标特征在第m个子模型的注意力模块中的参数权重；其中，m为小于M的正整数；Determine the parameter weight of the target feature in the attention module of the mth sub-model; where m is a positive integer less than M;

根据第m个子模型的注意力模块中的参数权重确定第m+1个子模型的注意力模块中的参数；其中，目标特征包括目标用户特征以及目标产品特征；The parameters in the attention module of the m+1 th sub-model are determined according to the parameter weights in the attention module of the m th sub-model; wherein, the target features include target user features and target product features;

基于目标特征以及第m+1个子模型的注意力模块中的参数，得到第m+1个子模型对应任务的预测结果。Based on the target feature and the parameters in the attention module of the m+1 th sub-model, the prediction result of the task corresponding to the m+1 th sub-model is obtained.

第三方面，本说明书实施例提供了一种多任务模型训练装置，多任务模型包括M个子模型，每个子模型分别对应一个任务，每个子模型分别包括一个注意力模块，多任务模型的训练装置包括：In a third aspect, the embodiments of this specification provide a multi-task model training device. The multi-task model includes M sub-models, each sub-model corresponds to a task, and each sub-model includes an attention module. The multi-task model training device include:

第一处理模块，用于确定样本特征在第m个子模型的注意力模块中的参数权重；其中，m为小于M的正整数；The first processing module is used to determine the parameter weight of the sample feature in the attention module of the mth sub-model; wherein, m is a positive integer smaller than M;

第二处理模块，用于根据第m个子模型的注意力模块中的参数权重确定第m+1个子模型的注意力模块中的参数；其中，样本特征包括样本用户特征、样本产品特征以及用户对于第m+1个子模型对应的任务的样本结果；The second processing module is configured to determine the parameters in the attention module of the m+1 th sub-model according to the parameter weights in the attention module of the m th sub-model; wherein the sample features include sample user features, sample product features, and user's The sample result of the task corresponding to the m+1th sub-model;

训练模块，用于基于样本特征、第m个子模型的注意力模块中的参数权重以及第m+1个子模型的注意力模块中的参数对多任务模型进行训练。The training module is used for training the multi-task model based on the sample features, the parameter weights in the attention module of the mth sub-model, and the parameters in the attention module of the m+1th sub-model.

第四方面，本说明书实施例还提供了一种多任务模型训练装置，可包括：处理器和存储器；其中，存储器存储有计算机程序，计算机程序适于由处理器加载并执行上述的多任务模型训练方法步骤。In a fourth aspect, the embodiments of this specification also provide a multi-task model training device, which may include: a processor and a memory; wherein, the memory stores a computer program, and the computer program is adapted to be loaded by the processor and execute the above-mentioned multi-task model Training method steps.

第五方面，本说明书实施例提供了一种计算机存储介质，计算机存储介质存储有多条指令，指令适于由处理器加载并执行上述的多任务模型训练方法步骤。In a fifth aspect, the embodiments of this specification provide a computer storage medium, where the computer storage medium stores a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the above-mentioned steps of the multi-task model training method.

第六方面，本说明书实施例提供了一种多任务预测装置，装置应用于多任务模型，多任务模型包括M个子模型，每个子模型分别对应一个任务，每个子模型分别包括一个注意力模块，装置包括：In a sixth aspect, an embodiment of this specification provides a multi-task prediction device, the device is applied to a multi-task model, the multi-task model includes M sub-models, each sub-model corresponds to a task, and each sub-model includes an attention module respectively, The device includes:

第三处理模块，用于确定目标特征在第m个子模型的注意力模块中的参数权重；其中，m为小于M的正整数；The third processing module is used to determine the parameter weight of the target feature in the attention module of the mth sub-model; wherein, m is a positive integer smaller than M;

第四处理模块，用于根据第m个子模型的注意力模块中的参数权重确定第m+1个子模型的注意力模块中的参数；其中，目标特征包括目标用户特征以及目标产品特征；The fourth processing module is used to determine the parameters in the attention module of the m+1 th sub-model according to the parameter weight in the attention module of the m th sub-model; wherein, the target features include target user features and target product features;

预测模块，用于基于目标特征以及第m+1个子模型的注意力模块中的参数，得到第m+1个子模型对应任务的预测结果。The prediction module is used to obtain the prediction result of the task corresponding to the m+1 th sub-model based on the target feature and the parameters in the attention module of the m+1 th sub-model.

第七方面，本说明书实施例还提供了一种多任务预测装置，可包括：处理器和存储器；其中，存储器存储有计算机程序，计算机程序适于由处理器加载并执行上述的多任务预测方法步骤。In a seventh aspect, embodiments of the present specification further provide a multi-task prediction apparatus, which may include: a processor and a memory; wherein, the memory stores a computer program, and the computer program is adapted to be loaded by the processor and execute the above-mentioned multi-task prediction method step.

第八方面，本说明书实施例还提供了一种计算机存储介质，计算机存储介质存储有多条指令，指令适于由处理器加载并执行上述的多任务预测方法步骤。In an eighth aspect, the embodiments of the present specification further provide a computer storage medium, where the computer storage medium stores a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the above-mentioned steps of the multi-task prediction method.

本说明书一些实施例提供的技术方案带来的有益效果至少包括：The beneficial effects brought by the technical solutions provided by some embodiments of this specification include at least:

在本说明书一个或多个实施例中，可在对多任务模型进行训练时，先确定样本特征在第m个子模型的注意力模块中的参数权重，根据该第m个子模型的注意力模块中的参数权重确定第m+1个子模型的注意力模块中的参数，并可基于该第m个子模型的注意力模块中的参数权重以及第m+1个子模型的注意力模块中的参数对多任务模型进行训练。该多任务模型可将多任务中的每个任务对应一个子模型，且在相邻任务之间通过注意力模块将上一个任务的关联信息迁移到下一个任务，以使下一个任务可结合该上一个任务的关联信息得到更为准确、关联性更强的预测结果。In one or more embodiments of this specification, when training the multi-task model, the parameter weights of the sample features in the attention module of the m-th sub-model may be determined first, and according to the attention module of the m-th sub-model The parameter weight of the parameter determines the parameters in the attention module of the m+1th sub-model, and can be based on the parameter weight in the attention module of the m-th sub-model and the parameters in the attention module of the m+1-th sub-model to many The task model is trained. The multi-task model can correspond each task in the multi-task to a sub-model, and transfer the associated information of the previous task to the next task through the attention module between adjacent tasks, so that the next task can combine the The correlation information of the previous task can get more accurate and more relevant prediction results.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例中所需使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings used in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.

图1为本说明书实施例提供的一种现有多任务模型的预测流程示意图；1 is a schematic diagram of a prediction flow of an existing multi-task model provided by an embodiment of the present specification;

图2为本说明书实施例提供的一种多任务模型的预测流程示意图；2 is a schematic diagram of a prediction flow of a multi-task model provided by an embodiment of the present specification;

图3为本说明书实施例提供的一种多任务模型训练方法的流程示意图；3 is a schematic flowchart of a multi-task model training method provided by an embodiment of the present specification;

图4为本说明书实施例提供的一种子模型的结构示意图；4 is a schematic structural diagram of a sub-model provided by an embodiment of the present specification;

图5为本说明书实施例提供的一种多任务模型的结构示意图；FIG. 5 is a schematic structural diagram of a multi-task model provided by an embodiment of the present specification;

图6为本说明书实施例提供的一种多任务预测方法的流程示意图；6 is a schematic flowchart of a multi-task prediction method provided by an embodiment of the present specification;

图7为本说明书实施例提供的一种多任务模型训练装置的结构示意图；FIG. 7 is a schematic structural diagram of a multi-task model training device provided by an embodiment of the present specification;

图8为本说明书实施例提供的一种多任务预测装置的结构示意图；FIG. 8 is a schematic structural diagram of a multi-task prediction apparatus according to an embodiment of the present specification;

图9为本说明书实施例提供的又一种多任务模型训练装置的结构示意图；FIG. 9 is a schematic structural diagram of another multi-task model training device provided by an embodiment of the present specification;

图10为本说明书实施例提供的又一种多任务预测装置的结构示意图。FIG. 10 is a schematic structural diagram of another multi-task prediction apparatus provided by an embodiment of the present specification.

具体实施方式Detailed ways

下面将结合本说明书实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present specification.

本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”等是用于区别不同对象，而不是用于描述特定顺序。此外，术语“包括”和“具有”以及它们任何变形，意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元，而是可选地还包括没有列出的步骤或单元，或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。The terms "first", "second", "third" and the like in the description and claims of the present application and the above drawings are used to distinguish different objects, rather than to describe a specific order. Furthermore, the terms "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product or device comprising a series of steps or units is not limited to the listed steps or units, but optionally also includes unlisted steps or units, or optionally also includes For other steps or units inherent to these processes, methods, products or devices.

一般生活类应用程序在用户使用时，会根据用户执行动作向用户提供更能提高用户需求的推荐服务。此处以可执行支付功能的第三方应用程序为例，用户可在该第三方应用程序的推荐界面进行浏览并点击感兴趣的相关信息，且可对该感兴趣的相关信息进行支付购买，其中用户的整个过程可划分为多个任务，例如用户在该第三方应用程序的推荐界面进行浏览时可对应为用户感兴趣的相关信息的曝光任务，用户在该第三方应用程序的推荐界面点击该感兴趣的相关信息时可对应为用户感兴趣的相关信息的点击任务，用户在该第三方应用程序支付购买该感兴趣的相关信息时可对应为用户感兴趣的相关信息的购买任务。可以理解的是，上述提到的曝光任务、点击任务以及购买任务之间具有顺序依赖性，也即是说在曝光任务发生之后才可继续发生点击任务，在点击任务发生之后才可继续发生购买任务，且相邻的任务之间具有关联性。When a general life application is used by the user, it will provide the user with a recommendation service that can improve the user's needs according to the user's actions. Taking a third-party application that can perform payment functions as an example here, the user can browse and click on the relevant information of interest on the recommendation interface of the third-party application, and can make payment for the relevant information of interest. The whole process can be divided into multiple tasks, for example, when the user browses the recommendation interface of the third-party application, it can correspond to the exposure task of the relevant information that the user is interested in. The related information of interest may correspond to the click task of the related information of the user's interest, and the user may correspond to the purchase task of the related information of the user's interest when the user pays to purchase the related information of interest in the third-party application. It is understandable that the above-mentioned exposure tasks, click tasks and purchase tasks have sequential dependencies, that is to say, the click task can continue to occur after the exposure task occurs, and the purchase can continue to occur after the click task occurs. tasks, and there is a relationship between adjacent tasks.

当然，与用户执行动作对应的多个任务的数量可不局限于上述提到的三个。例如对于某购买类第三方应用程序，用户在该购买类第三方应用程序进行下单时可先在该购买类第三方应用程序进行浏览，接着可对满足需求的物品进行点击，接着可在该满足需求的物品相应的店铺里领取抵扣券，接着可对该满足需求的物品进行核销，并对核销后的该满足需求的物品进行下单。其中，用户在该购买类第三方应用程序进行浏览时可对应为满足用户需求的物品的曝光任务，用户在该购买类第三方应用程序对满足需求的物品进行点击时可对应为满足用户需求的物品的点击任务，用户在该满足需求的物品相应的店铺里领取抵扣券时可对应为满足用户需求的物品的领券任务，用户在根据领取到的抵扣券对该满足需求的物品进行核销时可对应为满足用户需求的物品的核销任务，用户在该满足需求的物品核销之后进行下单时可对应为满足用户需求的物品的下任务。Of course, the number of the multiple tasks corresponding to the actions performed by the user may not be limited to the three mentioned above. For example, for a purchase-type third-party application, when the user places an order in the purchase-type third-party application, the user can first browse in the purchase-type third-party application, and then click on the item that meets the needs, and then click on the item that meets the needs. Receive a deduction coupon from the corresponding store for the item that meets the demand, and then write off the item that meets the demand, and place an order for the item that meets the demand after the write-off. Among them, when the user browses the third-party application of the purchase type, it can correspond to the exposure task of the item that meets the user's needs, and when the user clicks on the item that meets the user's needs in the third-party application of the purchase type, it can correspond to the item that meets the user's needs. For the click task of an item, when a user receives a coupon in the store corresponding to the item that meets the demand, it can correspond to the task of receiving the coupon for the item that meets the user's demand. The write-off can correspond to the write-off task of the item that meets the user's needs, and when the user places an order after the write-off of the item that meets the user's needs, it can correspond to the task of placing the item that meets the user's needs.

基于此，为得到用户需求的预测结果，一般可采用模型学习的方式对与该用户需求对应的多个任务进行学习训练，通过对每个任务的预测结果进行结合可得到最终预测结果。Based on this, in order to obtain the prediction result of the user's demand, a model learning method can generally be used to learn and train multiple tasks corresponding to the user's demand, and the final prediction result can be obtained by combining the prediction results of each task.

具体地，作为常见的一种对多任务进行预测的方式可为先将多个任务划分为多个任务，分别对多个任务进行学习，并通过对该多个任务各自对应的预测结果进行组合以得到多任务的最终预测结果。其中，可通过任务模型对每个任务进行学习训练，例如可将与每个任务对应的样本特征向量输入至任务模型中进行训练，以使每个任务对应的特征向量输入至训练好的任务模型，得到各自对应的预测结果。可以理解的是，此次可对每个任务分别设置一个任务模型，以保障每个任务模型的预测结果的准确性。此处以该多任务可划分为A、B以及C共三个任务为例，将A任务输入至训练好的任务模型中可得到对应的预测结果a，将B任务输入至训练好的任务模型中可得到对应的预测结果b，将C任务输入至训练好的任务模型中可得到对应的预测结果c，该多任务的最终预测结果可通过对预测结果a、预测结果并以及预测结果c结合得到，例如但不局限于为a×b×c。Specifically, as a common way of predicting multiple tasks, the multiple tasks can be divided into multiple tasks first, the multiple tasks are learned separately, and the prediction results corresponding to the multiple tasks can be combined by combining the respective prediction results of the multiple tasks. to get the final prediction result of multi-task. Among them, each task can be learned and trained through the task model. For example, the sample feature vector corresponding to each task can be input into the task model for training, so that the feature vector corresponding to each task can be input into the trained task model. , and get the corresponding prediction results. It is understandable that a task model can be set for each task this time to ensure the accuracy of the prediction results of each task model. Here, the multi-task can be divided into three tasks, A, B, and C as an example. Input task A into the trained task model to obtain the corresponding prediction result a, and input task B into the trained task model. The corresponding prediction result b can be obtained, and the corresponding prediction result c can be obtained by inputting the task C into the trained task model. The final prediction result of the multi-task can be obtained by combining the prediction result a, the prediction result and the prediction result c. , such as but not limited to a×b×c.

此处还可参照图1示出的本说明书实施例提供的一种现有多任务模型的预测流程示意图。如图1所示，以多任务模型包括两个子模型，事件对应的多任务包括第一任务以及第二任务为例，第一任务的预测结果可表示为事件被点击的预测概率，第二任务的预测结果可表示为事件若被点击，则被转化的预测概率，多任务的预测结果可表示为事件被点击，然后被转化的预测概率。具体地，在对与事件对应的多任务进行发生概率预测时，可先将与该多任务对应的特征向量输入至第一模型中，得到与第一任务对应的第一预测结果。其中，该第一模型可为多任务模型的一个子模型，其对应于第一任务的任务模型，可由已知预测结果的样本特征训练得到。接着可将与该多任务对应的特征向量输入至第二模型中，得到与第二任务对应的第二预测结果。其中，该第二模型可为多任务模型的一个子模型，其对应于第二任务的任务模型，可由已知预测结果的样本特征训练得到。进一步的，该多任务的目标预测结果可通过结合与第一任务对应的第一预测结果以及与第二任务对应的第二预测结果得到，例如通过与第一任务对应的第一预测结果乘以与第二任务对应的第二预测结果得到。可以理解的是，此处与第一任务对应的第一预测结果以及与第二任务对应的第二预测结果不限定于先后顺序，例如还可先得到与第二任务对应的第二预测结果，再得到与第一任务对应的第一预测结果，或是同时得到与第一任务对应的第一预测结果以及与第二任务对应的第二预测结果。Here, reference may also be made to a schematic diagram of a prediction flow of an existing multi-task model provided by an embodiment of the present specification shown in FIG. 1 . As shown in Figure 1, taking the multi-task model including two sub-models, and the multi-task corresponding to the event including the first task and the second task as an example, the prediction result of the first task can be expressed as the predicted probability of the event being clicked, the second task The prediction result of the event can be expressed as the predicted probability of being converted if the event is clicked, and the prediction result of multi-tasking can be expressed as the predicted probability of the event being clicked and then converted. Specifically, when predicting the occurrence probability of a multi-task corresponding to an event, the feature vector corresponding to the multi-task may be input into the first model to obtain a first prediction result corresponding to the first task. The first model may be a sub-model of the multi-task model, which corresponds to the task model of the first task, and can be obtained by training with sample features of known prediction results. Then, the feature vector corresponding to the multitasking can be input into the second model to obtain a second prediction result corresponding to the second task. Wherein, the second model may be a sub-model of the multi-task model, which corresponds to the task model of the second task, and can be obtained by training with sample features of known prediction results. Further, the target prediction result of the multi-task can be obtained by combining the first prediction result corresponding to the first task and the second prediction result corresponding to the second task, for example, by multiplying the first prediction result corresponding to the first task by A second prediction result corresponding to the second task is obtained. It can be understood that the first prediction result corresponding to the first task and the second prediction result corresponding to the second task are not limited to the sequence, for example, the second prediction result corresponding to the second task can also be obtained first. Then, the first prediction result corresponding to the first task is obtained, or the first prediction result corresponding to the first task and the second prediction result corresponding to the second task are simultaneously obtained.

可看出，上述提到的对多任务进行预测的方式没有考虑到各个任务之间的强关联性，仅仅通过标量相乘的简单方式来得到损失有关联信息的最终预测结果，易影响该最终预测结果的准确性。It can be seen that the above-mentioned method of multi-task prediction does not take into account the strong correlation between each task, and only obtains the final prediction result of the loss of relevant information through the simple method of scalar multiplication, which is easy to affect the final prediction result. The accuracy of the forecast results.

接下来，为更好的解决上述技术问题，本说明书的一个或多个实施例将进行解释说明。Next, in order to better solve the above-mentioned technical problems, one or more embodiments of this specification will be explained.

请参阅图2，图2示出了本说明书实施例提供的一种多任务模型的预测流程示意图。如图2所示，该多任务模型可包括两个子模型(可表示为第一模型以及第二模型)，其对应的事件可包括第一任务以及第二任务，此处第一模型可对应于预测第一任务的发生概率，可参照上述提到的第一任务的预测结果可表示为事件被点击的预测概率，此处第二模型可对应于预测第二任务的发生概率，可参照上述提到的第二任务的预测结果可表示为事件若被点击，则被转化的预测概率，则多任务的预测结果可表示为事件被点击，然后被转化的预测概率。Referring to FIG. 2, FIG. 2 shows a schematic diagram of a prediction flow of a multi-task model provided by an embodiment of the present specification. As shown in FIG. 2 , the multi-task model may include two sub-models (which may be represented as a first model and a second model), and the corresponding events may include the first task and the second task, where the first model may correspond to To predict the probability of occurrence of the first task, you can refer to the above-mentioned prediction result of the first task, which can be expressed as the predicted probability of the event being clicked, where the second model can correspond to the probability of predicting the occurrence of the second task. The obtained prediction result of the second task can be expressed as the predicted probability that the event is converted if it is clicked, and the predicted result of the multi-task can be expressed as the predicted probability that the event is clicked and then converted.

当预测与事件对应的多任务的发生概率时，可先将与该多任务对应的特征向量输入至第一模型中，得到在第一模型的注意力模块中的参数权重。其中，该第一模型可包括第一模块以及注意力模块，具体地可先将与该多任务对应的特征向量输入至第一模块中，得到第一转化信息，再将该第一转化信息输入至注意力模块中，得到该在第一模型的注意力模块中的参数权重。可以理解的是，此处第一模型的注意力模块中的参数权重可用于表征与第一任务与第二任务之间的关联信息所对应的参数，其也可对应为第一模型的注意力模块中所有参数的部分参数。此处以第一模型的注意力模块中所有参数可表示为A、B、C、D以及E为例，该第一模型的注意力模块中的参数权重可以但不局限于表示为(0，1，1，0，0)，也即第一任务与第二任务之间的关联信息所对应的参数包括B以及C。When predicting the occurrence probability of the multitasking corresponding to the event, the feature vector corresponding to the multitasking can be input into the first model to obtain the parameter weight in the attention module of the first model. The first model may include a first module and an attention module. Specifically, the feature vector corresponding to the multi-task may be input into the first module to obtain the first transformation information, and then the first transformation information may be input into the first module. To the attention module, the parameter weights in the attention module of the first model are obtained. It can be understood that the parameter weights in the attention module of the first model here can be used to represent the parameters corresponding to the association information between the first task and the second task, which can also correspond to the attention of the first model. A partial parameter of all parameters in the module. Here, all parameters in the attention module of the first model can be represented as A, B, C, D and E as an example, the parameter weights in the attention module of the first model can be represented as (0, 1 but not limited to) , 1, 0, 0), that is, the parameters corresponding to the association information between the first task and the second task include B and C.

此处提到的与多任务对应的特征向量可以但不局限于包括用户特征以及任务产品特征，其中，用户特征可理解为要执行事件的目标用户的特征信息，例如目标用户的身份信息，具体可包括用户名称、用户分类或用户地址等任意至少一种信息。此处以执行的事件应用于某购买类第三方应用程序为例，可通过查询目标用户填写的用户信息确定该目标用户的特征信息，该目标用户的特征信息除了上述提到的用户名称、用户分类以及用户地址之外，还可包括目标用户在预设时间内的浏览记录或是购买记录，且还可记录有用户已领取过的兑换券或是商品优惠券，本实施例不限定于此。The feature vectors corresponding to multitasking mentioned here may include, but are not limited to, user features and task product features, wherein the user features can be understood as the feature information of the target user to perform the event, such as the target user's identity information, specifically. It can include any at least one kind of information such as user name, user classification or user address. Here, taking the execution of an event applied to a purchase-type third-party application as an example, the characteristic information of the target user can be determined by querying the user information filled in by the target user. The characteristic information of the target user is in addition to the user name, user classification mentioned above In addition to the user address, the target user's browsing records or purchase records within a preset time may also be included, and exchange coupons or commodity coupons that the user has received may also be recorded, which is not limited in this embodiment.

其中，任务产品特征可理解为用于表征与事件对应的产品的特征信息，例如事件的产品名称、产品生产信息或是产品定义信息等任意至少一种信息，该产品定义信息可理解为用于表征产品功能的信息。此处以执行的事件应用于某购买类第三方应用程序为例，当确定需要购买的某产品时，可通过在该某产品所在店铺查询与该某产品对应的特征信息，例如上述提到的产品名称、产品生产时间以及产品功能等信息，且还可记录有该某产品对应的优惠信息，本实施例不限定于此。Among them, the task product feature can be understood as the feature information used to represent the product corresponding to the event, such as the product name of the event, product production information, or any at least one kind of information such as product definition information. The product definition information can be understood as being used for Information that characterizes the functionality of the product. Here, the executed event is applied to a purchase-type third-party application as an example. When a certain product to be purchased is determined, the characteristic information corresponding to the certain product can be inquired at the store where the certain product is located, such as the products mentioned above. Information such as name, product production time, product function, etc., and preferential information corresponding to a certain product may also be recorded, which is not limited in this embodiment.

进一步的，在得到在第一模型的注意力模块中的参数权重之后，可将该在第一模型的注意力模块中的参数权重传递至第二模型的注意力模块中，以根据该在第一模型的注意力模块中的参数权重确定在第二模型的注意力模块中的参数。其中，第二模型可包括第二模块以及注意力模块，该第二模型的注意力模块可根据接收到的在第一模型的注意力模块中的参数权重来调整自身的参数，可使调整后的在第二模型的注意力模块中的参数具有第一任务与第二任务之间的关联信息。此处以第二模型的注意力模块中的参数可表示为B、D、E以及F，第一模型的注意力模块中的参数权重对应的参数可表示为B以及C为例，调整后的第二模型的注意力模块中的参数可表示为B、C、D、E以及F。可以理解的是，此处第二模型的注意力模块的结构可与第一模型的注意力模块的结构一致，但各自对应的参数不同。Further, after obtaining the parameter weights in the attention module of the first model, the parameter weights in the attention module of the first model can be transferred to the attention module of the second model, so that according to the parameter weights in the attention module of the first model The parameter weights in the attention module of one model determine the parameters in the attention module of the second model. The second model may include a second module and an attention module, and the attention module of the second model may adjust its own parameters according to the received parameter weights in the attention module of the first model, so that the adjusted parameters can be adjusted. The parameters in the attention module of the second model have the association information between the first task and the second task. Here, the parameters in the attention module of the second model can be expressed as B, D, E and F, and the parameters corresponding to the parameter weights in the attention module of the first model can be expressed as B and C as an example. The parameters in the attention module of the two-model can be denoted as B, C, D, E, and F. It can be understood that the structure of the attention module of the second model here may be consistent with the structure of the attention module of the first model, but the corresponding parameters are different.

还可以理解的是，在将在第一模型的注意力模块中的参数权重传递至第二模型的注意力模块的过程中，可以但不局限于通过连接模块进行传递，例如可通过全连接层实现传递过程，以保障传递过程中第一模型的注意力模块中的参数权重的完整性。It can also be understood that, in the process of transferring the parameter weights in the attention module of the first model to the attention module of the second model, the transfer can be carried out through the connection module but is not limited to, for example, through the fully connected layer. The transfer process is implemented to ensure the integrity of the parameter weights in the attention module of the first model during the transfer process.

进一步的，在确定出第二模型的注意力模块中的参数之后，可将该多任务对应的特征向量输入至第二模型中，并通过该第二模型的注意力模块预测出与事件对应的多任务的发生概率。其中，参照上述提到的该第二模型包括第二模块以及注意力模块，具体地，可先将该多任务对应的特征向量输入至第二模块中，得到第二转化信息，再将该第二转化信息输入至已调整参数的注意力模块中，直接得到该与事件对应的多任务的发生概率。可以理解的是，根据第二模型得到的预测结果是结合有第一模型对应的第一任务与第二模型对应的第二任务之间的关联信息的，该预测结果相较于先分别得到各个子模型对应任务的预测结果，再通过结合各个对应任务的预测结果得到最终多任务的预测结果，整个预测过程仅需得到最终的一个预测结果，不仅减少数据的处理时间，还可在最终得到的预测结果中保留相邻任务之间的关联信息，以使该最终得到的预测结果更加准确。Further, after the parameters in the attention module of the second model are determined, the feature vector corresponding to the multi-task can be input into the second model, and the attention module of the second model can predict the event corresponding to the event. Probability of multitasking. The second model mentioned above includes a second module and an attention module. Specifically, the feature vector corresponding to the multi-task can be input into the second module to obtain the second transformation information, and then the first The second transformation information is input into the attention module whose parameters have been adjusted, and the occurrence probability of the multi-task corresponding to the event is directly obtained. It can be understood that the prediction result obtained according to the second model is combined with the association information between the first task corresponding to the first model and the second task corresponding to the second model. The prediction results of the corresponding tasks of the sub-model, and then the final multi-task prediction results are obtained by combining the prediction results of each corresponding task. The entire prediction process only needs to obtain the final prediction result, which not only reduces the data processing time, but also obtains the final results. The correlation information between adjacent tasks is retained in the prediction result, so that the final prediction result is more accurate.

当然，本实施例可以但不局限于包括两个子模型的多任务模型，此处以上述提到的多任务可包括曝光任务、点击任务、领券任务、核销任务以及下任务为例，多任务模型还可设置为包括有五个子模型(可分别表示为第一模型、第二模型、第三模型、第四模型以及第五模型)，其中第一模型可对应为预测曝光任务的发生概率，第二模型可对应为预测点击任务的发生概率，第三模型可对应为领券任务的发生概率，第四模型可对应为核销任务的发生概率，第五模型可对应为下任务的发生概率，且每个模型中均可包括有注意力模块。具体地，可将与多任务对应的特征向量先输入至第一模型中，得到在第一模型的注意力模块中的参数权重，并将该在第一模型的注意力模块中的参数权重传递至第二模型的注意力模块，以调整该第二模型的注意力模块中的参数。进一步的，可将与多任务对应的特征向量再输入至第二模型中，得到在第二模型的注意力模块中的参数权重，并将该在第二模型的注意力模块中的参数权重传递至第三模型的注意力模块，以调整该第三模型的注意力模块中的参数。进一步的，可将与多任务对应的特征向量再输入至第三模型中，得到在第三模型的注意力模块中的参数权重，并将该在第三模型的注意力模块中的参数权重传递至第四模型的注意力模块，以调整该第四模型的注意力模块中的参数。进一步的，可将与多任务对应的特征向量输入至第四模型中，得到在第四模型的注意力模块中的参数权重，并将该在第四模型的注意力模块中的参数权重传递至第五模型的注意力模块，以调整该第五模型的注意力模块中的参数。进一步的，可将与多任务对应的特征向量输入至第五模型中，通过该第五模型的注意力模块得到最终的预测结果。Of course, this embodiment can be, but is not limited to, a multi-task model including two sub-models. Here, the above-mentioned multi-task may include exposure task, click task, coupon task, write-off task and next task as an example. The model can also be set to include five sub-models (which can be represented as a first model, a second model, a third model, a fourth model, and a fifth model, respectively), wherein the first model can correspond to the probability of predicting the occurrence of the exposure task, The second model can correspond to the occurrence probability of the predicted click task, the third model can correspond to the occurrence probability of the coupon collection task, the fourth model can correspond to the occurrence probability of the write-off task, and the fifth model can correspond to the occurrence probability of the next task , and each model can include an attention module. Specifically, the feature vector corresponding to the multi-task can be input into the first model first to obtain the parameter weight in the attention module of the first model, and the parameter weight in the attention module of the first model can be transferred to the attention module of the second model to adjust the parameters in the attention module of the second model. Further, the feature vector corresponding to the multi-task can be input into the second model again to obtain the parameter weight in the attention module of the second model, and the parameter weight in the attention module of the second model can be passed. to the attention module of the third model to adjust the parameters in the attention module of the third model. Further, the feature vector corresponding to the multi-task can be input into the third model again to obtain the parameter weight in the attention module of the third model, and the parameter weight in the attention module of the third model can be passed. to the attention module of the fourth model to adjust the parameters in the attention module of the fourth model. Further, the feature vector corresponding to the multi-task can be input into the fourth model, the parameter weight in the attention module of the fourth model can be obtained, and the parameter weight in the attention module of the fourth model can be passed to The attention module of the fifth model to adjust the parameters in the attention module of the fifth model. Further, the feature vector corresponding to the multi-task can be input into the fifth model, and the final prediction result can be obtained through the attention module of the fifth model.

请参阅图3，图3示出了本说明书实施例提供的一种多任务模型训练方法的流程示意图。Please refer to FIG. 3, which shows a schematic flowchart of a multi-task model training method provided by an embodiment of the present specification.

如图3所示，该多任务模型训练方法至少可以包括以下步骤：As shown in Figure 3, the multi-task model training method may include at least the following steps:

步骤302、确定样本特征在第m个子模型的注意力模块中的参数权重。Step 302: Determine the parameter weight of the sample feature in the attention module of the mth sub-model.

当需要预测出多任务的发生概率时，可通过将与该多任务对应的目标特征输入至本实施例中提到的训练好的多任务模型中，以便于得到更准确的预测结果。基于此，对本实施例中提到的多任务模型进行训练的过程尤为重要。When the occurrence probability of multitasking needs to be predicted, the target feature corresponding to the multitasking can be input into the trained multitasking model mentioned in this embodiment, so as to obtain a more accurate prediction result. Based on this, the process of training the multi-task model mentioned in this embodiment is particularly important.

具体地，在对该多任务模型进行训练之前，可先确定样本特征所对应的多任务中任务的数量以及各任务所对应的样本结果。其中，样本特征可包括样本用户特征、样本产品特征以及用户对于每个任务的样本结果，样本用户特征可理解为要执行事件的用户的样本特征信息，例如用户的样本身份信息，具体可包括用户名称、用户分类或用户地址等任意至少一种信息。样本产品特征可理解为用于表征与事件对应的产品的样本特征信息，例如事件的样本产品名称、样本产品生产信息或是样本产品定义信息等任意至少一种信息，该样本产品定义信息可理解为用于表征产品功能的信息。可以理解的是，用于对于样本产品的需求可对应为多个任务，且每个任务可对应有已知的样本结果，该样本结果可以但不局限于当确定对应的任务不发生时，可表示为字符0，当确定对应的任务已发生时，可表示为字符1。Specifically, before training the multi-task model, the number of tasks in the multi-task corresponding to the sample features and the sample results corresponding to each task may be determined. The sample features may include sample user features, sample product features, and user sample results for each task, and sample user features can be understood as sample feature information of the user who wants to execute the event, such as the user's sample identity information, which may specifically include the user Any at least one kind of information such as name, user classification or user address. The sample product feature can be understood as the sample feature information used to characterize the product corresponding to the event, such as the sample product name of the event, sample product production information, or sample product definition information, etc. Any at least one kind of information, the sample product definition information can be understood Information used to characterize product functionality. It can be understood that the requirements for sample products can correspond to multiple tasks, and each task can correspond to a known sample result. The sample result can be, but not limited to, when it is determined that the corresponding task does not occur. It is represented as character 0, and when it is determined that the corresponding task has occurred, it can be represented as character 1.

在确定出样本特征所对应的多任务中任务的数量以及各任务所对应的样本结果之后，可根据样本用户特征以及样本产品特征得到多任务中第m个任务所对应的模型的注意力模块中的参数权重。其中，本实施例的多任务模型可包括M个子模型，每个子模型可包括一个注意力模块，该样本特征所对应的多任务中每个任务均可对应一个子模型，且相邻的任务之间可对应相邻的两个子模型。例如，该样本特征所对应的多任务中第一个任务可对应多任务模型中的第一个子模型，该样本特征所对应的多任务中第二个任务可对应多任务模型中的第二个子模型，该样本特征所对应的多任务中第m个任务可对应多任务模型中的第m个子模型，也即是说上述提到的多任务中第m个任务所对应的模型的注意力模块中的参数权重可理解为多任务模型中的第m个子模型的注意力模块中的参数权重。此处M可为大于m的正整数，例如当m为2时，M可为大于2的正整数。After determining the number of tasks in the multi-task corresponding to the sample features and the sample results corresponding to each task, the attention module of the model corresponding to the m-th task in the multi-task can be obtained according to the sample user features and the sample product features. parameter weights. The multi-task model in this embodiment may include M sub-models, each sub-model may include an attention module, each task in the multi-task corresponding to the sample feature may correspond to a sub-model, and the adjacent tasks can correspond to two adjacent sub-models. For example, the first task in the multi-task corresponding to the sample feature may correspond to the first sub-model in the multi-task model, and the second task in the multi-task corresponding to the sample feature may correspond to the second sub-model in the multi-task model sub-model, the m-th task in the multi-task corresponding to the sample feature can correspond to the m-th sub-model in the multi-task model, that is, the attention of the model corresponding to the m-th task in the multi-task mentioned above The parameter weight in the module can be understood as the parameter weight in the attention module of the mth sub-model in the multi-task model. Here, M may be a positive integer greater than m, for example, when m is 2, M may be a positive integer greater than 2.

可以理解的是，基于上述提到的样本特征所对应的多任务中第m个任务可对应多任务模型中的第m个子模型，第m个子模型中的注意力模块中的参数权重可用于表征与第m个任务与第m+1个任务之间的关联信息所对应的参数，其也可对应为第m个任务对应的子模型的注意力模块中所有参数的部分参数。此处以该第m个任务对应的子模型的注意力模块中所有参数可表示为A、B、C、D以及E为例，该第m个任务对应的子模型的注意力模块中的参数权重可以但不局限于表示为(0，1，1，0，0)，也即第m个任务与第m+1个任务之间的关联信息所对应的参数包括B以及C。It can be understood that the m-th task in the multi-task corresponding to the above-mentioned sample features can correspond to the m-th sub-model in the multi-task model, and the parameter weights in the attention module in the m-th sub-model can be used to represent The parameter corresponding to the association information between the m th task and the m+1 th task may also correspond to some parameters of all parameters in the attention module of the sub-model corresponding to the m th task. Here, all parameters in the attention module of the sub-model corresponding to the m-th task can be represented as A, B, C, D and E as an example, the parameter weights in the attention module of the sub-model corresponding to the m-th task It can be expressed as (0, 1, 1, 0, 0) but is not limited to, that is, the parameters corresponding to the association information between the m th task and the m+1 th task include B and C.

还可以理解的是，本实施例中的第m个子模型所对应的任务可为样本特征对应的多任务中第一个任务直至倒数第二个任务中的任意一个任务。可能的，当样本特征对应的多任务的数量为5时，该m的取值可为1或是2或是3或是4。可能的，当样本特征对应的多任务的数量为2时，该m的取值仅可为1。It can also be understood that the task corresponding to the mth sub-model in this embodiment may be any task from the first task to the penultimate task in the multi-task corresponding to the sample feature. Possibly, when the number of multi-tasks corresponding to the sample feature is 5, the value of m may be 1 or 2 or 3 or 4. Possibly, when the number of multi-tasks corresponding to the sample feature is 2, the value of m can only be 1.

步骤304、根据第m个子模型的注意力模块中的参数权重确定第m+1个子模型的注意力模块中的参数。Step 304: Determine the parameters in the attention module of the m+1 th sub-model according to the parameter weights in the attention module of the m th sub-model.

具体地，在确定出第m个子模型的注意力模块中的参数权重之后，可将该第m个子模型的注意力模块中的参数权重传递至相邻的第m+1个子模型的注意力模块中，以调整该第m+1个子模型的注意力模块中的参数。此处以第m+1模型的注意力模块中的参数可表示为B、D、E以及F，第m模型的注意力模块中的参数权重对应的参数可表示为B以及C为例，调整后的第m+1模型的注意力模块中的参数可表示为B、C、D、E以及F。Specifically, after the parameter weight in the attention module of the mth sub-model is determined, the parameter weight in the attention module of the m-th sub-model can be transferred to the attention module of the adjacent m+1-th sub-model , to adjust the parameters in the attention module of the m+1th sub-model. Here, the parameters in the attention module of the m+1th model can be expressed as B, D, E and F, and the parameters corresponding to the parameter weights in the attention module of the mth model can be expressed as B and C as an example. After adjustment The parameters in the attention module of the m+1 th model can be denoted as B, C, D, E, and F.

可以理解的是，本实施例的多任务模型中每个子模型所包括的注意力模块均可采用相同的结构，区别之处在于每个子模型所包括的注意力模块中参数各不相同。It can be understood that the attention modules included in each sub-model in the multi-task model of this embodiment may adopt the same structure, and the difference lies in that the parameters of the attention modules included in each sub-model are different.

步骤306、基于样本特征、第m个子模型的注意力模块中的参数权重以及第m+1个子模型的注意力模块中的参数对多任务模型进行训练。Step 306 , train the multi-task model based on the sample features, the parameter weights in the attention module of the m th sub-model, and the parameters in the attention module of the m+1 th sub-model.

具体地，在根据第m个子模型的注意力模块中的参数权重确定第m+1个子模型的注意力模块中的参数之后，可将样本特征中的样本用户特征以及样本产品特征输入至第m+1个子模型中，并通过该第m+1个子模型的注意力模块得到第m+1个任务的发生概率。可以理解的是，该方式得到的第m+1个任务的发生概率包含有前m个任务之间的关联信息，相较于现有技术中得到的第m+1个任务的发生概率准确性更高。Specifically, after the parameters in the attention module of the m+1 th sub-model are determined according to the parameter weights in the attention module of the m th sub-model, the sample user features and the sample product features in the sample features can be input into the m th sub-model. In the +1 sub-model, the occurrence probability of the m+1-th task is obtained through the attention module of the m+1-th sub-model. It can be understood that the occurrence probability of the m+1 th task obtained in this way includes the correlation information between the first m tasks, which is more accurate than the occurrence probability of the m+1 th task obtained in the prior art. higher.

进一步的，在得到该第m+1个任务的发生概率之后，可结合样本特征中第m+1个任务的样本结果对该多任务模型的第m个子模型以及第m+1个子模型进行参数优化训练，以使该第m+1个任务的发生概率更加趋近于样本特征中第m+1个任务的样本结果。可以理解的是，在对多任务模型的训练过程中，随着第m个子模型中的参数发生变化，第m个子模型的注意力模块中的参数权重也会发生变化，进而第m+1个子模型的注意力模块中的参数也会发生变化，进而可导致第m+1个任务的发生概率更加趋近于样本特征中第m+1个任务的样本结果。Further, after obtaining the occurrence probability of the m+1 th task, the m th sub-model and the m+1 th sub-model of the multi-task model can be parameterized in combination with the sample results of the m+1 th task in the sample features. The training is optimized so that the occurrence probability of the m+1 th task is closer to the sample result of the m+1 th task in the sample features. It can be understood that in the training process of the multi-task model, as the parameters in the m-th sub-model change, the parameter weights in the attention module of the m-th sub-model also change, and then the m+1-th sub-model changes. The parameters in the attention module of the model will also change, which in turn can cause the occurrence probability of the m+1 th task to be closer to the sample result of the m+1 th task in the sample features.

需要说明的是，当m为大于1的任意一个正整数时，可按照每个任务的排列顺序依次得到各自对应的子模型的注意力模块中的参数权重，进而可根据第m个子模型的注意力模块中的参数权重确定第m+1个子模型的注意力模块中的参数，且在训练过程中可对m+1个子模型全部进行训练，以提高训练效率。例如，当m为3时，可依次确定出第一个子模型的注意力模块中的参数权重、第二个子模型的注意力模块中的参数权重以及第三个子模型的注意力模块中的参数权重，再根据第三个子模型的注意力模块中的参数权重确定第四个子模型的注意力模块中的参数，得到第四个任务的发生概率，并根据该第四个任务的发生概率以及对应于第四个任务的样本结果对四个子模型进行参数优化训练。It should be noted that when m is any positive integer greater than 1, the parameter weights in the attention modules of the corresponding sub-models can be obtained in turn according to the arrangement order of each task, and then the attention of the m-th sub-model can be obtained. The parameter weights in the force module determine the parameters in the attention module of the m+1 th sub-model, and all the m+1 sub-models can be trained during the training process to improve training efficiency. For example, when m is 3, the parameter weights in the attention module of the first sub-model, the parameter weights in the attention module of the second sub-model, and the parameters in the attention module of the third sub-model can be sequentially determined weight, and then determine the parameters in the attention module of the fourth sub-model according to the parameter weights in the attention module of the third sub-model to obtain the probability of occurrence of the fourth task, and according to the probability of occurrence of the fourth task and the corresponding Based on the sample results of the fourth task, the parameters of the four sub-models are optimized and trained.

在本说明书实施例中，多任务模型可将多任务中的每个任务对应一个子模型，且在相邻任务之间通过注意力模块将上一个任务的关联信息迁移到下一个任务，以使下一个任务可结合该上一个任务的关联信息得到更为准确、关联性更强的预测结果。In the embodiment of the present specification, the multi-task model may correspond to a sub-model for each task in the multi-task, and transfer the associated information of the previous task to the next task through the attention module between adjacent tasks, so that the The next task can be combined with the related information of the previous task to obtain a more accurate and more relevant prediction result.

作为本实施例的一种可选，第m个子模型还包括嵌入模块、组合模块以及第一转换模块；As an option of this embodiment, the mth sub-model further includes an embedded module, a combination module, and a first conversion module;

确定样本特征在第m个子模型的注意力模块中的参数权重，包括：Determine the parameter weights of the sample features in the attention module of the mth sub-model, including:

将样本用户特征以及样本产品特征输入至嵌入模块，得到分别与样本用户特征以及样本产品特征对应的特征向量；Input the sample user features and sample product features into the embedding module to obtain feature vectors corresponding to the sample user features and sample product features respectively;

将分别与样本用户特征以及样本产品特征对应的特征向量输入至组合模块，得到组合特征向量；Input the feature vectors corresponding to the sample user features and the sample product features respectively into the combination module to obtain the combined feature vector;

将组合特征向量输入至第一转换模块，得到第一转换信息；inputting the combined feature vector into the first conversion module to obtain the first conversion information;

将第一转换信息输入至第m个子模型的注意力模块，得到样本特征在第m个子模型的参数权重。The first conversion information is input to the attention module of the mth sub-model, and the parameter weight of the sample feature in the mth sub-model is obtained.

具体地，此处可参阅图4示出的本说明书实施例提供的一种子模型的结构示意图。如图4所示，该结构示意图可用于表示本实施例中第m个子模型的结构示意图，该第m个子模型可按照连接顺序依次包括嵌入模块、组合模块、第一转化模块以及注意力模块。其中，嵌入模块可用于输入样本特征中的样本用户特征以及样本产品特征，得到分别与样本用户特征以及样本产品特征对应的特征向量。可以理解的是，此处样本用户特征可表示为样本用户的身份特征，例如但不局限于包括用户名称、用户分类以及用户地址等，样本产品特征可表示为用于表征与事件对应的产品的样本特征信息，例如但不局限于包括事件的样本产品名称、样本产品生产信息以及样本产品定义信息，该样本产品定义信息可理解为用于表征产品功能的信息。本实施例中的嵌入模块可以但不局限于为语音处理技术中的嵌入层(也即Embedding layer层)，在该嵌入模块中可预设设置有包含字典以及字符的语料库，且每一个字符可对应于字典中的一个汉字，例如汉字“你”可对应于字符1，汉字“好”可对应于字符3。当样本用户特征以及样本产品特征输入至该嵌入模块中时，可根据语料库将该样本用户特征以及样本产品特征中每个汉字转换为对应的字符，并通过矩阵的形式将分别与样本用户特征以及样本产品特征对应的特征向量输出至组合模块。还可以理解的是，此处可不限定输入样本用户特征以及样本产品特征的顺序，可能的，可先将样本用户特征输入至嵌入模块得到与该样本用户特征对应的特征向量，再将样本产品特征输入至嵌入模块得到与样本产品特征对应的特征向量。可能的，可先将样本产品特征输入至嵌入模块得到与样本产品特征对应的特征向量，再将样本用户特征输入至嵌入模块得到与该样本用户特征对应的特征向量。可能的，可同时将样本用户特征以及样本产品特征输入至嵌入模块，同时得到与该样本用户特征对应的特征向量以及与样本产品特征对应的特征向量。Specifically, reference may be made here to a schematic structural diagram of a sub-model provided by an embodiment of the present specification shown in FIG. 4 . As shown in FIG. 4 , the schematic structural diagram can be used to represent the structural schematic diagram of the m th sub-model in this embodiment, and the m th sub-model can sequentially include an embedding module, a combining module, a first transformation module, and an attention module in order of connection. The embedding module can be used to input the sample user features and the sample product features in the sample features, and obtain feature vectors corresponding to the sample user features and the sample product features respectively. It can be understood that the sample user features here can be expressed as the identity features of the sample users, such as but not limited to including the user name, user classification, and user address, etc., and the sample product features can be expressed as the features used to represent the products corresponding to the events. Sample feature information, such as but not limited to sample product names including events, sample product production information, and sample product definition information, the sample product definition information may be understood as information used to characterize product functions. The embedding module in this embodiment can be, but is not limited to, an embedding layer (that is, an Embedding layer) in the speech processing technology. A corpus containing a dictionary and characters can be preset in the embedding module, and each character can be Corresponding to a Chinese character in the dictionary, for example, the Chinese character "you" may correspond to character 1, and the Chinese character "好" may correspond to character 3. When the sample user features and sample product features are input into the embedding module, each Chinese character in the sample user features and sample product features can be converted into corresponding characters according to the corpus, and the sample user features and The feature vector corresponding to the sample product feature is output to the combination module. It can also be understood that the order of inputting the sample user features and the sample product features is not limited here. If possible, the sample user features can be input into the embedding module to obtain the feature vector corresponding to the sample user features, and then the sample product features can be input. Input to the embedding module to obtain the feature vector corresponding to the sample product features. Possibly, the sample product features can be input into the embedding module first to obtain feature vectors corresponding to the sample product features, and then the sample user features can be input into the embedding module to obtain the feature vectors corresponding to the sample user features. Possibly, the sample user feature and the sample product feature can be input into the embedding module at the same time, and the feature vector corresponding to the sample user feature and the feature vector corresponding to the sample product feature can be obtained at the same time.

进一步的，在将分别与样本用户特征以及样本产品特征对应的特征向量输入至组合模块之后，可由组合模块输出组合特征向量。其中，组合特征向量可理解为与样本用户特征对应的特征向量以及与样本产品特征对应的特征向量的组合，此处以与样本用户特征对应的特征向量可表示为[A，B]，以及与样本产品特征对应的特征向量可表示为[C，D，E]为例，经过组合模块得到的组合特征向量可表示为[A，B，C，D，E]。Further, after the feature vectors respectively corresponding to the sample user features and the sample product features are input to the combining module, the combining module can output the combined feature vector. Among them, the combined feature vector can be understood as the combination of the feature vector corresponding to the sample user feature and the feature vector corresponding to the sample product feature. Here, the feature vector corresponding to the sample user feature can be expressed as [A, B], and the feature vector corresponding to the sample user feature The feature vector corresponding to the product feature can be expressed as [C, D, E] as an example, and the combined feature vector obtained by the combination module can be expressed as [A, B, C, D, E].

进一步的，在根据组合模块得到组合特征向量之后，可将该组合特征向量输入至第一转换模块中，得到第一转换信息。其中，第一转换模块可理解为与第m个任务对应的神经网络，用于根据组合特征向量得到可表征第m个任务对应特征的向量信息。可以理解的是，第一转换模块可以但不局限于为深度神经网络(Deep Neural Networks，DNN)、深度兴趣学习网络(Deep Interest Network，DIN)或是深度推荐网络(Factorization-Machinebased Neural Network，DeepFM)等任意一种深度学习神经网络。Further, after the combined feature vector is obtained according to the combining module, the combined feature vector can be input into the first conversion module to obtain the first conversion information. The first conversion module can be understood as a neural network corresponding to the mth task, and is used to obtain vector information that can represent the corresponding features of the mth task according to the combined feature vector. It can be understood that the first conversion module can be, but is not limited to, a deep neural network (Deep Neural Networks, DNN), a deep interest learning network (Deep Interest Network, DIN) or a deep recommendation network (Factorization-Machinebased Neural Network, DeepFM). ) and any other deep learning neural network.

进一步的，在根据第一转换模块得到第一转换信息之后，可将该第一转换信息输入至注意力模块，学习得到样本特征在第m个子模型的参数权重。其中，该注意力模块可对第一转换信息中第m个任务与第m+1个任务之间关联信息对应的特征的向量信息进行学习，以得到与该第m个任务与第m+1个任务之间关联信息对应的特征的向量信息所对应的参数权重。可以理解的是，此处的注意力模块可以但不局限于为应用常见技术中的注意力机制(也即attention机制)，本实施例不限定于此。Further, after obtaining the first conversion information according to the first conversion module, the first conversion information can be input to the attention module, and the parameter weight of the sample feature in the mth sub-model can be obtained by learning. Wherein, the attention module can learn the vector information of the feature corresponding to the association information between the mth task and the m+1th task in the first conversion information, so as to obtain the relationship between the mth task and the m+1th task. The parameter weights corresponding to the vector information of the features corresponding to the associated information between the tasks. It can be understood that, the attention module here may be, but not limited to, an attention mechanism (ie, an attention mechanism) in a common technology, and this embodiment is not limited to this.

作为本实施例的又一种可选，第m+1个子模型还包括嵌入模块、组合模块以及第二转换模块；As another option of this embodiment, the m+1 th sub-model further includes an embedded module, a combination module, and a second conversion module;

基于样本特征、第m个子模型的注意力模块中的参数权重以及第m+1个子模型的注意力模块中的参数对多任务模型进行训练，包括：The multi-task model is trained based on sample features, parameter weights in the attention module of the m-th sub-model, and parameters in the attention module of the m+1-th sub-model, including:

将组合特征向量输入至第二转换模块，得到第二转换信息；inputting the combined feature vector into the second conversion module to obtain second conversion information;

将第二转换信息输入至第m+1个子模型的注意力模块，并根据第二转换信息以及第m+1个子模型的注意力模块中的参数得到第m+1个子模型对应的任务的预测结果；Input the second conversion information into the attention module of the m+1 th sub-model, and obtain the prediction of the task corresponding to the m+1 th sub-model according to the second conversion information and the parameters in the attention module of the m+1 th sub-model result;

根据第m+1个子模型对应的任务的预测结果以及用户对于第m+1个子模型对应的任务的样本结果，对多任务模型进行训练。The multi-task model is trained according to the prediction result of the task corresponding to the m+1 th sub-model and the user's sample result of the task corresponding to the m+1 th sub-model.

具体地，此处可参阅图5示出的本说明书实施例提供的一种多任务模型的结构示意图。如图5所示，该多任务模型可包括第m个子模型以及第m+1个子模型，该第m个子模型可对应为样本特征的所对应的多任务中第m个任务，该第m+1个子模型可对应为样本特征所对应的多任务中第m+1个任务。其中，第m个子模型可按照连接顺序依次包括嵌入模块、组合模块、第一转化模块以及注意力模块，第m+1个子模型可按照连接顺序依次包括嵌入模块、组合模块、第二转化模块以及注意力模块。可以理解的是，此处第m个子模型中的嵌入模块与第m+1个子模型中的嵌入模块可为同一个嵌入模块(也可理解为共享同一个嵌入模块)，该方式可有效保留第m个任务与第m+1个任务之间的关联信息。同样的，此处第m个子模型中的组合模块也可与第m+1个子模型中的组合模块可为同一个组合模块，以保障第m个任务与第m+1个任务之间的关联信息的可靠性。Specifically, reference may be made here to the schematic structural diagram of a multi-task model provided by an embodiment of the present specification shown in FIG. 5 . As shown in FIG. 5 , the multi-task model may include the m-th sub-model and the m+1-th sub-model, and the m-th sub-model may correspond to the m-th task in the multi-task corresponding to the sample feature. One sub-model may correspond to the m+1 th task in the multi-task corresponding to the sample feature. Wherein, the mth sub-model may include an embedding module, a combination module, a first transformation module, and an attention module in order of connection, and the m+1th sub-model may include an embedding module, a combination module, a second transformation module, and a second transformation module in order of connection. attention module. It can be understood that the embedded module in the mth sub-model and the embedded module in the m+1th sub-model can be the same embedded module (it can also be understood as sharing the same embedded module), which can effectively retain the Association information between m tasks and the m+1th task. Similarly, the combination module in the mth sub-model can also be the same combination module as the combination module in the m+1th submodel, so as to ensure the association between the mth task and the m+1th task reliability of information.

在第m+1个子模型中，嵌入模块可用于输入样本特征中的样本用户特征以及样本产品特征，得到分别与样本用户特征以及样本产品特征对应的特征向量。可以理解的是，此处样本用户特征可表示为样本用户的身份特征，例如但不局限于包括用户名称、用户分类以及用户地址等，样本产品特征可表示为用于表征与事件对应的产品的样本特征信息，例如但不局限于包括事件的样本产品名称、样本产品生产信息以及样本产品定义信息，该样本产品定义信息可理解为用于表征产品功能的信息。本实施例中的嵌入模块可以但不局限于为语音处理技术中的嵌入层(也即Embedding layer层)，在该嵌入模块中可预设设置有包含字典以及字符的语料库，且每一个字符可对应于字典中的一个汉字，例如汉字“你”可对应于字符1，汉字“好”可对应于字符3。当样本用户特征以及样本产品特征输入至该嵌入模块中时，可根据语料库将该样本用户特征以及样本产品特征中每个汉字转换为对应的字符，并通过矩阵的形式将分别与样本用户特征以及样本产品特征对应的特征向量输出至组合模块。还可以理解的是，此处可不限定输入样本用户特征以及样本产品特征的顺序，可能的，可先将样本用户特征输入至嵌入模块得到与该样本用户特征对应的特征向量，再将样本产品特征输入至嵌入模块得到与样本产品特征对应的特征向量。可能的，可先将样本产品特征输入至嵌入模块得到与样本产品特征对应的特征向量，再将样本用户特征输入至嵌入模块得到与该样本用户特征对应的特征向量。可能的，可同时将样本用户特征以及样本产品特征输入至嵌入模块，同时得到与该样本用户特征对应的特征向量以及与样本产品特征对应的特征向量。In the m+1th sub-model, the embedding module can be used to input the sample user features and sample product features in the sample features, and obtain feature vectors corresponding to the sample user features and sample product features respectively. It can be understood that the sample user features here can be expressed as the identity features of the sample users, such as but not limited to including the user name, user classification, and user address, etc., and the sample product features can be expressed as the features used to represent the products corresponding to the events. Sample feature information, such as but not limited to sample product names including events, sample product production information, and sample product definition information, the sample product definition information may be understood as information used to characterize product functions. The embedding module in this embodiment can be, but is not limited to, an embedding layer (that is, an Embedding layer) in the speech processing technology. A corpus containing a dictionary and characters can be preset in the embedding module, and each character can be Corresponding to a Chinese character in the dictionary, for example, the Chinese character "you" may correspond to character 1, and the Chinese character "好" may correspond to character 3. When the sample user features and sample product features are input into the embedding module, each Chinese character in the sample user features and sample product features can be converted into corresponding characters according to the corpus, and the sample user features and The feature vector corresponding to the sample product feature is output to the combination module. It can also be understood that the order of inputting the sample user features and the sample product features is not limited here. If possible, the sample user features can be input into the embedding module to obtain the feature vector corresponding to the sample user features, and then the sample product features can be input. Input to the embedding module to obtain the feature vector corresponding to the sample product features. Possibly, the sample product features can be input into the embedding module first to obtain feature vectors corresponding to the sample product features, and then the sample user features can be input into the embedding module to obtain the feature vectors corresponding to the sample user features. Possibly, the sample user feature and the sample product feature can be input into the embedding module at the same time, and the feature vector corresponding to the sample user feature and the feature vector corresponding to the sample product feature can be obtained at the same time.

进一步的，在根据组合模块得到组合特征向量之后，可将该组合特征向量输入至第二转换模块中，得到第二转换信息。其中，第二转换模块可理解为与第m+1个任务对应的神经网络，用于根据组合特征向量得到可表征第m+1个任务对应特征的向量信息。可以理解的是，第二转换模块可以但不局限于为深度神经网络(Deep Neural Networks，DNN)、深度兴趣学习网络(Deep Interest Network，DIN)或是深度推荐网络(Factorization-Machinebased Neural Network，DeepFM)等任意一种深度学习神经网络。Further, after the combined feature vector is obtained according to the combining module, the combined feature vector may be input into the second conversion module to obtain second conversion information. The second conversion module can be understood as a neural network corresponding to the m+1 th task, and is used to obtain vector information that can represent the corresponding features of the m+1 th task according to the combined feature vector. It can be understood that the second conversion module can be, but is not limited to, a deep neural network (Deep Neural Networks, DNN), a deep interest learning network (Deep Interest Network, DIN) or a deep recommendation network (Factorization-Machinebased Neural Network, DeepFM). ) and any other deep learning neural network.

进一步的，在根据第二转换模块得到第二转换信息之后，可将该第二转换信息输入至注意力模块，并根据注意力模块中调整的参数以及该注意力模块中的激活函数sigmoid得到该第m+1个任务的预测结果。可以理解的是，该第m+1个子模型中的注意力模块的参数可包含有与第m个任务之间的关联信息，通过该第m+1个子模型中的注意力模块得到的第m+1个任务的发生概率具有与第m个任务之间的关联性，进而相较于常见技术可使该预测结果更具可靠性以及准确性。Further, after the second conversion information is obtained according to the second conversion module, the second conversion information can be input into the attention module, and the second conversion information can be obtained according to the parameters adjusted in the attention module and the activation function sigmoid in the attention module. The prediction result of the m+1th task. It can be understood that the parameters of the attention module in the m+1 th sub-model may contain the association information with the m th task, and the m th task obtained by the attention module in the m+1 th sub-model can be obtained. The probability of occurrence of +1 task has a correlation with the mth task, which in turn makes the prediction result more reliable and accurate than common techniques.

还可以理解的是，第m+1个子模型中的注意力模块与第m个子模型中的注意力模块之间可设置有全连接模块，当确定出第m个子模型的注意力模块中的参数权重之后，可将该第m个子模型的注意力模块中的参数权重输入至全连接模块中，以得到包含该第m个子模型的注意力模块中的参数权重的连接信息。此处需要说明的是，本实施例中的全连接模块可用于保留该第m个子模型的注意力模块中的参数权重并对该第m个子模型的注意力模块中的参数权重进行传递，可有效保障该第m个子模型的注意力模块中的参数权重的完整性。在该全连接模块得到包含该第m个子模型的注意力模块中的参数权重的连接信息之后，可由该全连接模块将该包含第m个子模型的注意力模块中的参数权重的连接信息输入至第m+1个子模型的注意力模块中，以使该第m+1个子模型的注意力模块中的参数结合该第m个子模型的注意力模块中的参数权重进行调整。It can also be understood that a fully connected module can be set between the attention module in the m+1th sub-model and the attention module in the m-th sub-model. When the parameters in the attention module of the m-th sub-model are determined After the weighting, the parameter weights in the attention module of the mth sub-model can be input into the fully connected module, so as to obtain connection information including the parameter weights in the attention module of the mth sub-model. It should be noted here that the fully connected module in this embodiment can be used to retain the parameter weights in the attention module of the mth sub-model and transmit the parameter weights in the attention module of the mth sub-model. The integrity of the parameter weights in the attention module of the m-th sub-model is effectively guaranteed. After the fully-connected module obtains the connection information containing the parameter weights in the attention module of the m-th sub-model, the fully-connected module can input the connection information containing the parameter weights in the attention module of the m-th sub-model to In the attention module of the m+1 th sub-model, the parameters in the attention module of the m+1 th sub-model are adjusted in combination with the parameter weights in the attention module of the m-th sub-model.

进一步的，在得到第m+1个子模型对应任务的预测结果之后，可结合该第m+1个子模型对应任务的预测结果与用户对于第m+1个子模型对应的任务的样本结果，对多任务模型中的参数进行优化训练，直至该第m+1个子模型对应任务的预测结果更加趋近于用户对于第m+1个子模型对应的任务的样本结果。可以理解的是，本实施例中可以但不局限于通过第m+1个子模型的预测结果以及样本结果对多任务模型进行训练，例如还可通过第m个子模型的预测结果以及样本结果对多任务模型进行训练，不限定于此。Further, after the prediction result of the task corresponding to the m+1 th sub-model is obtained, the prediction result of the task corresponding to the m+1 th sub-model can be combined with the sample result of the user for the task corresponding to the m+1 th sub-model, to many The parameters in the task model are optimized and trained until the prediction result of the task corresponding to the m+1 th sub-model is closer to the user's sample result for the task corresponding to the m+1 th sub-model. It can be understood that in this embodiment, the multi-task model can be trained by, but not limited to, the prediction result of the m+1 th sub-model and the sample result. The task model is trained, not limited to this.

需要说明的是，上述提到的多任务模型可以但不局限于仅包括第m个子模型以及第m+1个子模型，例如还可包括第一个子模型、第二个子模型...第m+2个子模型、第m+3个子模型等等，每个子模型均可对应为样本特征的所对应的多任务中的一个任务，且每个子模型按照连接顺序依次可包括嵌入模块、组合模块、各自对应的转化模块以及注意力模块。可以理解的是，在任意相邻的两个子模型的注意力模块之间均可设置有全连接模块，以便于将前一个子模型的注意力模块中的参数权重传递至后一个子模型的注意力模块。It should be noted that the above-mentioned multi-task model may include, but is not limited to, only the mth sub-model and the m+1th sub-model, for example, the first sub-model, the second sub-model...the m-th sub-model may also be included. +2 sub-models, m+3 sub-models, etc., each sub-model can correspond to a task in the corresponding multi-task of the sample feature, and each sub-model can include an embedded module, a combination module, The corresponding transformation modules and attention modules. It can be understood that a fully connected module can be set between the attention modules of any two adjacent sub-models, so as to transfer the parameter weights in the attention module of the previous sub-model to the attention of the latter sub-model. force module.

作为本实施例的又一种可选，根据第二转换信息以及第m+1个子模型的参数得到第m+1个子模型对应的任务的预测结果之后，还包括：As another option of this embodiment, after obtaining the prediction result of the task corresponding to the m+1 th sub-model according to the second conversion information and the parameters of the m+1 th sub-model, the method further includes:

根据第m+1个子模型对应的任务的预测结果以及样本特征，得到第一损失函数；Obtain the first loss function according to the prediction result of the task corresponding to the m+1 th sub-model and the sample characteristics;

基于第一损失函数对第m+1个子模型进行优化。The m+1 th sub-model is optimized based on the first loss function.

具体地，在对多任务模型训练的过程中，当得到第m+1个子模型对应的任务的预测结果之后，可通过计算交叉熵损失函数(也即对应于上述提到的第一损失函数)的方式来对第m+1个子模型进行参数优化。其中，该交叉熵损失函数的计算方式如下公式(1)所示：Specifically, in the process of training the multi-task model, after obtaining the prediction result of the task corresponding to the m+1th sub-model, the cross-entropy loss function (that is, corresponding to the first loss function mentioned above) can be calculated by calculating way to optimize the parameters of the m+1th sub-model. Among them, the calculation method of the cross entropy loss function is shown in the following formula (1):

公式(1)中，L_ce可表示为第m+1个子模型的交叉熵损失函数，θ为第m+1个模型的参数，N为样本特征的数量，D为样本集，(x,y_t)为样本集D中的一组样本，x可表示为样本中第m+1个任务对应的样本用户特征以及样本产品特征，y_t可表示为用户在第m+1个任务对应的样本结果，

可表示为第m+1个子模型对应的任务的预测结果。此处y_t的取值可以但不局限于为1或0，0可表征第m+1个任务未完成，1可表征第m+1个任务已完成。In formula (1), L _ce can be expressed as the cross-entropy loss function of the m+1th sub-model, θ is the parameter of the m+1th model, N is the number of sample features, D is the sample set, (x, y _t ) is a set of samples in the sample set D, x can be represented as the sample user feature and sample product feature corresponding to the m+1th task in the sample, y _t can be represented as the user’s sample corresponding to the m+1th task result,

It can be expressed as the prediction result of the task corresponding to the m+1th sub-model. The value of y _t here may be, but not limited to, 1 or 0, 0 may indicate that the m+1 th task has not been completed, and 1 may indicate that the m+1 th task has been completed.

作为本实施例的又一种可选，根据第m+1个子模型对应的任务的预测结果以及样本特征，得到第一损失函数之后，基于第一损失函数对第m+1个子模型进行优化之前，还包括：As another option of this embodiment, after the first loss function is obtained according to the prediction result of the task corresponding to the m+1 th sub-model and the sample characteristics, before the m+1 th sub-model is optimized based on the first loss function ,Also includes:

判断第m+1个子模型对应的任务的预测结果是否满足预设条件；Determine whether the prediction result of the task corresponding to the m+1th sub-model satisfies the preset condition;

在确定第m+1个子模型对应的任务的预测结果不满足预设条件时，根据第一转换信息以及第m个子模型的注意力模块中的参数得到第m个子模型对应的任务的预测结果；When it is determined that the prediction result of the task corresponding to the m+1 th sub-model does not meet the preset condition, obtain the prediction result of the task corresponding to the m th sub-model according to the first conversion information and the parameters in the attention module of the m th sub-model;

根据第m个子模型对应的任务的预测结果、第m+1个子模型对应的任务的预测结果以及样本特征，得到第二损失函数；Obtain the second loss function according to the prediction result of the task corresponding to the mth submodel, the prediction result of the task corresponding to the m+1th submodel, and the sample characteristics;

基于第一损失函数对第m+1个子模型进行优化，包括：Optimize the m+1 th sub-model based on the first loss function, including:

基于第一损失函数以及第二损失函数对第m+1个子模型进行优化。The m+1 th sub-model is optimized based on the first loss function and the second loss function.

根据实际情况可知，只有当完成了第m个任务，才有可能完成第m+1个任务，也即是说第m个任务对应的预测结果是高于第m+1个任务对应的预测结果。基于此，可先通过判断该第m+1个任务对应的预测结果是否满足要求，再根据判断结果对该第m+1个子模型进行参数优化。According to the actual situation, only when the mth task is completed, it is possible to complete the m+1th task, that is to say, the prediction result corresponding to the mth task is higher than the prediction result corresponding to the m+1th task. . Based on this, it is possible to first judge whether the prediction result corresponding to the m+1 th task meets the requirements, and then perform parameter optimization of the m+1 th sub-model according to the judgment result.

具体地，在计算出第m+1个子模型的交叉熵损失函数之后，可对第m+1个子模型对应的任务的预测结果与预设结果进行比较，并在确定该第m+1个子模型对应的任务的预测结果高于预设结果的情况下，先根据第m个子模型的注意力模块中的参数以及该注意力模块中的激活函数sigmoid得到该第m个任务的预测结果，再结合第m个子模型对应的任务的预测结果、第m+1个子模型对应的任务的预测结果以及样本特征，得到概率校准损失函数(也即上述提到的第二损失函数)。其中，第m个任务的预测结果在得到之前，可先根据第m-1个子模型的注意力模块中的参数权重确定第m个子模型的注意力模块中的参数，再将样本用户特征以及样本产品特征输入至第m个子模型中，依次经过嵌入模块、组合模块、第一转换模块以及注意力模块，通过注意力模块中的激活函数sigmoid得到该第m个任务的预测结果。Specifically, after calculating the cross-entropy loss function of the m+1 th sub-model, the prediction result of the task corresponding to the m+1 th sub-model can be compared with the preset result, and after determining the m+1 th sub-model When the prediction result of the corresponding task is higher than the preset result, first obtain the prediction result of the m-th task according to the parameters in the attention module of the m-th sub-model and the activation function sigmoid in the attention module, and then combine The prediction result of the task corresponding to the m th sub-model, the prediction result of the task corresponding to the m+1 th sub-model, and the sample features are obtained to obtain a probability calibration loss function (that is, the second loss function mentioned above). Among them, before the prediction result of the m-th task is obtained, the parameters in the attention module of the m-th sub-model can be determined according to the parameter weights in the attention module of the m-1-th sub-model, and then the sample user characteristics and sample The product features are input into the mth sub-model, and then go through the embedding module, the combination module, the first conversion module and the attention module in turn, and the prediction result of the mth task is obtained through the activation function sigmoid in the attention module.

可以理解的是，上述提到的预设结果可根据用户在第m个任务对应的样本结果确定，例如但不局限于可设置该预设结果小于该用户在第m个任务对应的样本结果。It can be understood that the above-mentioned preset result can be determined according to the sample result corresponding to the mth task of the user, for example, but not limited to, the preset result can be set to be smaller than the sample result corresponding to the user's mth task.

此处概率校准损失函数的计算方式可通过如下公式(2)得出：The calculation method of the probability calibration loss function here can be obtained by the following formula (2):

公式(2)中，L_le可表示为第m+1个子模型的概率校准损失函数，

可表示为第m个子模型对应的任务的预测结果，

可表示为第m+1个子模型对应的任务的预测结果。此处当

大于

时(也即是说第m+1个子模型对应的任务的预测结果大于第m个子模型对应的任务的预测结果)，表明第m+1个子模型对应的任务的预测结果不符合实际情况，需要引入该概率校准损失函数对多任务模型进行校准。In formula (2), L _le can be expressed as the probability calibration loss function of the m+1th sub-model,

can be expressed as the prediction result of the task corresponding to the mth sub-model,

It can be expressed as the prediction result of the task corresponding to the m+1th sub-model. here when

more than the

(that is to say, the prediction result of the task corresponding to the m+1th submodel is greater than the prediction result of the task corresponding to the mth submodel), it indicates that the prediction result of the task corresponding to the m+1th submodel does not conform to the actual situation. The probabilistic calibration loss function is introduced to calibrate the multi-task model.

为便于理解的，此处基于第一损失函数以及第二损失函数对第m+1个子模型进行优化时，可先根据第一损失函数以及第二损失函数确定出第m+1个子模型的目标损失函数，再通过该目标损失函数对第m+1个子模型进行参数优化。此处可通过如下公式(3)计算出第m+1个子模型的目标损失函数：For ease of understanding, when optimizing the m+1 th sub-model based on the first loss function and the second loss function, the target of the m+1 th sub-model can be determined first according to the first loss function and the second loss function. The loss function is used to optimize the parameters of the m+1th sub-model through the target loss function. Here, the objective loss function of the m+1th sub-model can be calculated by the following formula (3):

L(θ)＝L_ce(θ)+αL_le(θ) (3)L(θ)=L _ce (θ)+αL _le (θ) (3)

公式(3)中L(θ)可表示为第m+1个子模型的目标损失函数，L_ce(θ)可表示为第m+1个子模型的交叉熵损失函数，L_le(θ)可表示为第m+1个子模型的概率校准损失函数，α可表示为校准参数，α越大可表明第m+1个子模型的概率校准损失函数的权重更大。In formula (3), L(θ) can be expressed as the objective loss function of the m+1th sub-model, L _ce (θ) can be expressed as the cross-entropy loss function of the m+1th sub-model, and L _le (θ) can be expressed as is the probability calibration loss function of the m+1 th sub-model, α can be expressed as a calibration parameter, and the larger α is, the greater the weight of the probability calibration loss function of the m+1 th sub-model.

可以理解的是，上述公式(3)还可对于多任务模型中其他的任意一个子模型进行参数优化，本实施例不限定于此。It can be understood that, the above formula (3) can also perform parameter optimization for any other sub-model in the multi-task model, and this embodiment is not limited to this.

需要说明的是，为进一步提高对多任务模型的第m+1个子模型的参数优化效果，还可在第m个子模型以及第m+1个子模型中分别加入均方误差模块，当样本用户特征以及样本产品特征经过嵌入模块时可将其抽象成映射函数，分别基于该第m个子模型的映射函数以及该第m+1个子模型的映射函数构建第m+1个子模型的均方误差损失函数，并结合上述提到的交叉熵损失函数、概率校准损失函数以及该均方误差损失函数对第m+1个子模型进行参数优化。It should be noted that, in order to further improve the parameter optimization effect of the m+1 th sub-model of the multi-task model, the mean square error module can also be added to the m th sub-model and the m+1 th sub-model respectively. And the sample product feature can be abstracted into a mapping function when it passes through the embedding module, and the mean square error loss function of the m+1 th sub-model is constructed based on the mapping function of the m-th sub-model and the mapping function of the m+1-th sub-model respectively. , and combine the above-mentioned cross entropy loss function, probability calibration loss function and the mean square error loss function to optimize the parameters of the m+1th sub-model.

其中，第m+1个子模型的均方误差损失函数可通过如下公式(4)计算得到：Among them, the mean square error loss function of the m+1th sub-model can be calculated by the following formula (4):

公式(4)中L_mse可表示为第m+1个子模型的均方误差损失函数，f_t(x_i)可表示为第m+1个子模型的映射函数，f_t-1(x_i)可表示为第m个子模型的映射函数。In formula (4), L _mse can be expressed as the mean square error loss function of the m+1 th sub-model, f _t ( _xi ) can be expressed as the mapping function of the m+1 th sub-model, f _t-1 (x _i ) can be expressed as the mapping function of the mth submodel.

此处可结合上述公式(3)以及均方误差损失函数，通过如下公式(5)得到新的第m+1个子模型的目标损失函数：Combining the above formula (3) and the mean square error loss function, the target loss function of the new m+1th sub-model can be obtained by the following formula (5):

L(θ)＝L_ce(θ)+αL_le(θ)+γL_mse (5)L(θ)=L _ce (θ)+αL _le (θ)+γL _mse (5)

公式(5)中L(θ)可表示为第m+1个子模型的目标损失函数，L_ce(θ)可表示为第m+1个子模型的交叉熵损失函数，L_le(θ)可表示为第m+1个子模型的概率校准损失函数，L_mse可表示为第m+1个子模型的均方误差损失函数，α可表示为校准参数，γ可表示为误差参数，该γ可为任意自然数。In formula (5), L(θ) can be expressed as the objective loss function of the m+1th sub-model, L _ce (θ) can be expressed as the cross-entropy loss function of the m+1th sub-model, and L _le (θ) can be expressed as is the probability calibration loss function of the m+1 th sub-model, L _mse can be expressed as the mean square error loss function of the m+1 th sub-model, α can be expressed as a calibration parameter, γ can be expressed as an error parameter, and γ can be any Natural number.

作为本实施例的又一种可选，根据第一转换信息以及第m个子模型的参数权重得到第m个子模型对应的任务的预测结果之后，还包括：As another option of this embodiment, after obtaining the prediction result of the task corresponding to the m th sub-model according to the first conversion information and the parameter weight of the m th sub-model, the method further includes:

根据第m个子模型对应的任务的预测结果以及样本特征，得到第三损失函数；According to the prediction result of the task corresponding to the mth sub-model and the sample characteristics, the third loss function is obtained;

基于第三损失函数对第m个子模型进行优化。The m-th sub-model is optimized based on the third loss function.

在对多任务模型训练的过程中，除了可对第m+1个子模型进行参数优化，还可对多任务模型中的任意一个子模型进行参数优化，以进一步保障多任务模型的预测结果的准确性。In the process of training the multi-task model, in addition to optimizing the parameters of the m+1th sub-model, parameters of any sub-model in the multi-task model can also be optimized to further ensure the accuracy of the prediction results of the multi-task model. sex.

具体地，当得到第m个子模型对应的任务的预测结果之后，可通过计算交叉熵损失函数(也即对应于上述提到的第三损失函数)的方式来对第m个子模型进行参数优化，其中，该交叉熵损失函数的计算方式可参阅上述公式(1)，此处不过多赘述。Specifically, after obtaining the prediction result of the task corresponding to the m-th sub-model, the parameters of the m-th sub-model can be optimized by calculating the cross-entropy loss function (that is, corresponding to the third loss function mentioned above). The calculation method of the cross-entropy loss function can refer to the above formula (1), which is not repeated here.

请参阅图6，图6示出了本说明书实施例提供的一种多任务预测方法的流程示意图。Referring to FIG. 6, FIG. 6 shows a schematic flowchart of a multi-task prediction method provided by an embodiment of the present specification.

如图6所示，该多任务预测方法至少可以包括以下步骤：As shown in Figure 6, the multi-task prediction method may at least include the following steps:

步骤602、确定目标特征在第m个子模型的注意力模块中的参数权重。Step 602: Determine the parameter weight of the target feature in the attention module of the mth sub-model.

具体地，目标特征可包括目标用户特征以及目标产品特征，其中目标用户特征可理解为要执行事件的用户的目标特征信息，例如用户的目标身份信息，具体可包括用户名称、用户分类或用户地址等任意至少一种信息。目标产品特征可理解为用于表征与事件对应的产品的目标特征信息，例如事件的目标产品名称、目标产品生产信息或是目标产品定义信息等任意至少一种信息，该目标产品定义信息可理解为用于表征产品功能的信息。Specifically, the target features may include target user features and target product features, where the target user features can be understood as target feature information of the user who wants to execute the event, such as the user's target identity information, which may specifically include user name, user classification or user address and other at least one kind of information. The target product feature can be understood as the target feature information used to represent the product corresponding to the event, such as the event target product name, target product production information, or target product definition information and any other at least one kind of information, the target product definition information can be understood Information used to characterize product functionality.

在先确定出目标用户特征以及目标产品特征之后，可根据目标用户特征以及目标产品特征得到多任务中第m个任务所对应的模型的注意力模块中的参数权重。其中，本实施例的多任务模型可包括M个子模型，每个子模型可包括一个注意力模块，该目标特征所对应的多任务中每个任务均可对应一个子模型，且相邻的任务之间可对应相邻的两个子模型。例如，该目标特征所对应的多任务中第一个任务可对应多任务模型中的第一个子模型，该目标特征所对应的多任务中第二个任务可对应多任务模型中的第二个子模型，该目标特征所对应的多任务中第m个任务可对应多任务模型中的第m个子模型，也即是说上述提到的多任务中第m个任务所对应的模型的注意力模块中的参数权重可理解为多任务模型中的第m个子模型的注意力模块中的参数权重。此处M可为大于m的正整数，例如当m为2时，M可为大于2的正整数。After the target user characteristics and the target product characteristics are first determined, the parameter weights in the attention module of the model corresponding to the mth task in the multi-task can be obtained according to the target user characteristics and the target product characteristics. The multi-task model in this embodiment may include M sub-models, each sub-model may include an attention module, each task in the multi-task corresponding to the target feature may correspond to a sub-model, and the adjacent tasks can correspond to two adjacent sub-models. For example, the first task in the multi-task corresponding to the target feature may correspond to the first sub-model in the multi-task model, and the second task in the multi-task corresponding to the target feature may correspond to the second sub-model in the multi-task model sub-model, the m-th task in the multi-task corresponding to the target feature can correspond to the m-th sub-model in the multi-task model, that is, the attention of the model corresponding to the m-th task in the multi-task mentioned above The parameter weight in the module can be understood as the parameter weight in the attention module of the mth sub-model in the multi-task model. Here, M may be a positive integer greater than m, for example, when m is 2, M may be a positive integer greater than 2.

可以理解的是，基于上述提到的目标特征所对应的多任务中第m个任务可对应多任务模型中的第m个子模型，第m个子模型中的注意力模块中的参数权重可用于表征与第m个任务与第m+1个任务之间的关联信息所对应的参数，其也可对应为第m个任务对应的子模型的注意力模块中所有参数的部分参数。此处以该第m个任务对应的子模型的注意力模块中所有参数可表示为A、B、C、D以及E为例，该第m个任务对应的子模型的注意力模块中的参数权重可以但不局限于表示为(0，1，1，0，0)，也即第m个任务与第m+1个任务之间的关联信息所对应的参数包括B以及C。It can be understood that the m-th task in the multi-task corresponding to the target features mentioned above can correspond to the m-th sub-model in the multi-task model, and the parameter weights in the attention module in the m-th sub-model can be used to represent The parameter corresponding to the association information between the m th task and the m+1 th task may also correspond to some parameters of all parameters in the attention module of the sub-model corresponding to the m th task. Here, all parameters in the attention module of the sub-model corresponding to the m-th task can be represented as A, B, C, D and E as an example, the parameter weights in the attention module of the sub-model corresponding to the m-th task It can be expressed as (0, 1, 1, 0, 0) but is not limited to, that is, the parameters corresponding to the association information between the m th task and the m+1 th task include B and C.

还可以理解的是，本实施例中的第m个子模型所对应的任务可为目标特征对应的多任务中第一个任务直至倒数第二个任务中的任意一个任务。可能的，当目标特征对应的多任务的数量为5时，该m的取值可为1或是2或是3或是4。可能的，当目标特征对应的多任务的数量为2时，该m的取值仅可为1。It can also be understood that the task corresponding to the mth sub-model in this embodiment may be any one of the tasks from the first task to the penultimate task in the multi-task corresponding to the target feature. Possibly, when the number of multi-tasks corresponding to the target feature is 5, the value of m may be 1 or 2 or 3 or 4. Possibly, when the number of multi-tasks corresponding to the target feature is 2, the value of m can only be 1.

步骤604、根据第m个子模型的注意力模块中的参数权重确定第m+1个子模型的注意力模块中的参数。Step 604: Determine the parameters in the attention module of the m+1 th sub-model according to the parameter weight in the attention module of the m th sub-model.

步骤606、基于目标特征以及第m+1个子模型的注意力模块中的参数，得到第m+1个子模型对应任务的预测结果。Step 606 , based on the target feature and the parameters in the attention module of the m+1 th sub-model, obtain the prediction result of the task corresponding to the m+1 th sub-model.

本实施例中第m个子模型可按照连接顺序依次包括嵌入模块、组合模块、第一转化模块以及注意力模块，第m+1个子模型可按照连接顺序依次包括嵌入模块、组合模块、第二转化模块以及注意力模块。可以理解的是，此处第m个子模型中的嵌入模块与第m+1个子模型中的嵌入模块可为同一个嵌入模块(也可理解为共享同一个嵌入模块)，该方式可有效保留第m个任务与第m+1个任务之间的关联信息。同样的，此处第m个子模型中的组合模块也可与第m+1个子模型中的组合模块可为同一个组合模块，以保障第m个任务与第m+1个任务之间的关联信息的可靠性。In this embodiment, the mth sub-model may include an embedding module, a combination module, a first transformation module, and an attention module in the order of connection, and the m+1th sub-model may include an embedding module, a combination module, and a second transformation in the order of connection. module and attention module. It can be understood that the embedded module in the mth sub-model and the embedded module in the m+1th sub-model can be the same embedded module (it can also be understood as sharing the same embedded module), which can effectively retain the Association information between m tasks and the m+1th task. Similarly, the combination module in the mth sub-model can also be the same combination module as the combination module in the m+1th submodel, so as to ensure the association between the mth task and the m+1th task reliability of information.

具体地，在第m+1个子模型中，嵌入模块可用于输入目标特征中的目标用户特征以及目标产品特征，得到分别与目标用户特征以及目标产品特征对应的特征向量。可以理解的是，此处目标用户特征可表示为目标用户的身份特征，例如但不局限于包括用户名称、用户分类以及用户地址等，目标产品特征可表示为用于表征与事件对应的产品的目标特征信息，例如但不局限于包括事件的目标产品名称、目标产品生产信息以及目标产品定义信息，该目标产品定义信息可理解为用于表征产品功能的信息。本实施例中的嵌入模块可以但不局限于为语音处理技术中的嵌入层(也即Embedding layer层)，在该嵌入模块中可预设设置有包含字典以及字符的语料库，且每一个字符可对应于字典中的一个汉字，例如汉字“你”可对应于字符1，汉字“好”可对应于字符3。当目标用户特征以及目标产品特征输入至该嵌入模块中时，可根据语料库将该目标用户特征以及目标产品特征中每个汉字转换为对应的字符，并通过矩阵的形式将分别与目标用户特征以及目标产品特征对应的特征向量输出至组合模块。还可以理解的是，此处可不限定输入目标用户特征以及目标产品特征的顺序，可能的，可先将目标用户特征输入至嵌入模块得到与该目标用户特征对应的特征向量，再将目标产品特征输入至嵌入模块得到与目标产品特征对应的特征向量。可能的，可先将目标产品特征输入至嵌入模块得到与目标产品特征对应的特征向量，再将目标用户特征输入至嵌入模块得到与该目标用户特征对应的特征向量。可能的，可同时将目标用户特征以及目标产品特征输入至嵌入模块，同时得到与该目标用户特征对应的特征向量以及与目标产品特征对应的特征向量。Specifically, in the m+1 th sub-model, the embedding module can be used to input the target user features and the target product features in the target features, and obtain feature vectors corresponding to the target user features and the target product features respectively. It can be understood that the target user feature here can be expressed as the identity feature of the target user, such as but not limited to including user name, user classification and user address, etc., and the target product feature can be expressed as a feature used to represent the product corresponding to the event. The target feature information includes, for example, but not limited to, the name of the target product of the event, the production information of the target product, and the target product definition information. The target product definition information may be understood as information used to characterize the function of the product. The embedding module in this embodiment can be, but is not limited to, an embedding layer (that is, an Embedding layer) in the speech processing technology. A corpus containing a dictionary and characters can be preset in the embedding module, and each character can be Corresponding to a Chinese character in the dictionary, for example, the Chinese character "you" may correspond to character 1, and the Chinese character "好" may correspond to character 3. When the target user feature and the target product feature are input into the embedded module, each Chinese character in the target user feature and the target product feature can be converted into a corresponding character according to the corpus, and the target user feature and The feature vector corresponding to the target product feature is output to the combination module. It can also be understood that the order of inputting the target user features and the target product features is not limited here. If possible, the target user features can be input into the embedding module to obtain the feature vector corresponding to the target user features, and then the target product features can be input. Input to the embedding module to obtain the feature vector corresponding to the target product feature. Possibly, the feature vector of the target product may be input into the embedding module to obtain the feature vector corresponding to the feature of the target product, and then the feature vector corresponding to the feature of the target user may be obtained by inputting the feature of the target user into the embedding module. Possibly, the target user feature and the target product feature can be input into the embedding module at the same time, and the feature vector corresponding to the target user feature and the feature vector corresponding to the target product feature can be obtained at the same time.

进一步的，在将分别与目标用户特征以及目标产品特征对应的特征向量输入至组合模块之后，可由组合模块输出组合特征向量。其中，组合特征向量可理解为与目标用户特征对应的特征向量以及与目标产品特征对应的特征向量的组合，此处以与目标用户特征对应的特征向量可表示为[A，B]，以及与目标产品特征对应的特征向量可表示为[C，D，E]为例，经过组合模块得到的组合特征向量可表示为[A，B，C，D，E]。Further, after the feature vectors respectively corresponding to the target user feature and the target product feature are input to the combining module, the combined feature vector can be output by the combining module. Among them, the combined feature vector can be understood as the combination of the feature vector corresponding to the target user feature and the feature vector corresponding to the target product feature. Here, the feature vector corresponding to the target user feature can be expressed as [A, B], and the feature vector corresponding to the target user feature The feature vector corresponding to the product feature can be expressed as [C, D, E] as an example, and the combined feature vector obtained by the combination module can be expressed as [A, B, C, D, E].

请参阅图7，图7示出了本说明书实施例提供的一种多任务模型训练装置的结构示意图。Referring to FIG. 7 , FIG. 7 shows a schematic structural diagram of a multi-task model training apparatus provided by an embodiment of the present specification.

该多任务模型包括M个子模型，每个子模型分别对应一个任务，每个子模型分别包括一个注意力模块。如图7所示，该多任务模型训练装置700至少还可以包括第一处理模块701、第二处理模块702以及训练模块703，其中：The multi-task model includes M sub-models, each sub-model corresponds to a task, and each sub-model includes an attention module. As shown in FIG. 7 , the multi-task model training apparatus 700 may further include at least a first processing module 701, a second processing module 702 and a training module 703, wherein:

第一处理模块701，用于确定样本特征在第m个子模型的注意力模块中的参数权重；其中，m为小于M的正整数；The first processing module 701 is used to determine the parameter weight of the sample feature in the attention module of the mth sub-model; wherein, m is a positive integer smaller than M;

第二处理模块702，用于根据所述第m个子模型的注意力模块中的参数权重确定第m+1个子模型的注意力模块中的参数；其中，所述样本特征包括样本用户特征、样本产品特征以及用户对于第m+1个子模型对应的任务的样本结果；The second processing module 702 is configured to determine parameters in the attention module of the m+1 th sub-model according to the parameter weights in the attention module of the m th sub-model; wherein the sample features include sample user features, sample Product features and user sample results for tasks corresponding to the m+1th sub-model;

训练模块703，用于基于所述样本特征、所述第m个子模型的注意力模块中的参数权重以及所述第m+1个子模型的注意力模块中的参数对所述多任务模型进行训练。A training module 703, configured to train the multi-task model based on the sample features, the parameter weights in the attention module of the m th sub-model, and the parameters in the attention module of the m+1 th sub-model .

在一些可能的实施例中，第m个子模型还包括嵌入模块、组合模块以及第一转换模块；In some possible embodiments, the mth sub-model further includes an embedding module, a combining module and a first converting module;

第一处理模块701包括：The first processing module 701 includes:

第一嵌入单元，用于将样本用户特征以及样本产品特征输入至嵌入模块，得到分别与样本用户特征以及样本产品特征对应的特征向量；a first embedding unit, configured to input the sample user features and the sample product features into the embedding module to obtain feature vectors corresponding to the sample user features and the sample product features respectively;

第一组合单元，用于将分别与样本用户特征以及样本产品特征对应的特征向量输入至组合模块，得到组合特征向量；The first combining unit is used to input the feature vectors corresponding to the sample user features and the sample product features respectively into the combining module to obtain the combined feature vector;

第一转换单元，用于将组合特征向量输入至第一转换模块，得到第一转换信息；a first conversion unit, configured to input the combined feature vector into a first conversion module to obtain first conversion information;

第一生成单元，用于将第一转换信息输入至第m个子模型的注意力模块，得到样本特征在第m个子模型的参数权重。The first generating unit is configured to input the first conversion information to the attention module of the mth sub-model, and obtain the parameter weight of the sample feature in the mth sub-model.

在一些可能的实施例中，第m个子模型的注意力模块与第m+1个子模型的注意力模块之间设置有全连接模块；In some possible embodiments, a fully connected module is set between the attention module of the mth sub-model and the attention module of the m+1th sub-model;

第二处理模块702包括：The second processing module 702 includes:

连接单元，用于将第m个子模型的注意力模块中的参数权重输入至全连接模块，得到包含第m个子模型的注意力模块中的参数权重的连接信息；The connection unit is used for inputting the parameter weights in the attention module of the mth sub-model to the fully connected module to obtain connection information including the parameter weights in the attention module of the mth sub-model;

第二生成单元，用于将连接信息输入至第m+1个子模型的注意力模块，得到第m+1个子模型的注意力模块中的参数。The second generating unit is configured to input the connection information into the attention module of the m+1 th sub-model, and obtain the parameters in the attention module of the m+1 th sub-model.

在一些可能的实施例中，第m+1个子模型还包括嵌入模块、组合模块以及第二转换模块；In some possible embodiments, the m+1 th sub-model further includes an embedding module, a combining module and a second converting module;

训练模块703包括：Training module 703 includes:

第二嵌入单元，用于将样本用户特征以及样本产品特征输入至嵌入模块，得到分别与样本用户特征以及样本产品特征对应的特征向量；The second embedding unit is used for inputting the sample user features and the sample product features into the embedding module to obtain feature vectors corresponding to the sample user features and the sample product features respectively;

第二组合单元，用于将分别与样本用户特征以及样本产品特征对应的特征向量输入至组合模块，得到组合特征向量；The second combining unit is used to input the feature vectors corresponding to the sample user features and the sample product features respectively into the combining module to obtain the combined feature vector;

第二转换单元，用于将组合特征向量输入至第二转换模块，得到第二转换信息；a second conversion unit, configured to input the combined feature vector into a second conversion module to obtain second conversion information;

第三生成单元，用于将第二转换信息输入至第m+1个子模型的注意力模块，并根据第二转换信息以及第m+1个子模型的注意力模块中的参数得到第m+1个子模型对应的任务的预测结果；The third generation unit is configured to input the second conversion information into the attention module of the m+1 th sub-model, and obtain the m+1 th according to the second conversion information and the parameters in the attention module of the m+1 th sub-model The prediction results of the tasks corresponding to the sub-models;

训练单元，用于根据第m+1个子模型对应的任务的预测结果以及用户对于第m+1个子模型对应的任务的样本结果，对多任务模型进行训练。The training unit is configured to train the multi-task model according to the prediction result of the task corresponding to the m+1 th sub-model and the user's sample result of the task corresponding to the m+1 th sub-model.

在一些可能的实施例中，训练模块703还包括：In some possible embodiments, the training module 703 further includes:

第一计算单元，用于在根据第二转换信息以及第m+1个子模型的参数得到第m+1个子模型对应的任务的预测结果之后，根据第m+1个子模型对应的任务的预测结果以及样本特征，得到第一损失函数；The first computing unit is used to obtain the prediction result of the task corresponding to the m+1 th sub-model according to the second conversion information and the parameters of the m+1 th sub-model, according to the prediction result of the task corresponding to the m+1 th sub-model and sample features to obtain the first loss function;

第一优化单元，用于基于第一损失函数对第m+1个子模型进行优化。The first optimization unit is configured to optimize the m+1 th sub-model based on the first loss function.

判断单元，用于在根据第m+1个子模型对应的任务的预测结果以及样本特征，得到第一损失函数之后，基于第一损失函数对第m+1个子模型进行优化之前，判断第m+1个子模型对应的任务的预测结果是否满足预设条件；The judgment unit is used to judge the m+ Whether the prediction result of the task corresponding to a sub-model satisfies the preset condition;

第二计算单元，用于在确定第m+1个子模型对应的任务的预测结果不满足预设条件时，根据第一转换信息以及第m个子模型的注意力模块中的参数得到第m个子模型对应的任务的预测结果；The second calculation unit is configured to obtain the mth submodel according to the first conversion information and the parameters in the attention module of the mth submodel when it is determined that the prediction result of the task corresponding to the m+1th submodel does not meet the preset condition The prediction result of the corresponding task;

第三计算单元，用于根据第m个子模型对应的任务的预测结果、第m+1个子模型对应的任务的预测结果以及样本特征，得到第二损失函数；The third computing unit is used to obtain the second loss function according to the prediction result of the task corresponding to the mth submodel, the prediction result of the task corresponding to the m+1th submodel, and the sample characteristics;

第一优化单元具体用于：The first optimization unit is specifically used for:

第四计算单元，用于在根据第一转换信息以及第m个子模型的参数权重得到第m个子模型对应的任务的预测结果之后，根据第m个子模型对应的任务的预测结果以及样本特征，得到第三损失函数；The fourth calculation unit is used to obtain the prediction result of the task corresponding to the m th sub-model according to the first conversion information and the parameter weight of the m th sub-model, according to the prediction result of the task corresponding to the m th sub-model and the sample feature, to obtain The third loss function;

第二优化单元，用于基于第三损失函数对第m个子模型进行优化。The second optimization unit is configured to optimize the mth sub-model based on the third loss function.

请参阅图8，图8示出了本说明书实施例提供的一种多任务预测装置的结构示意图。Referring to FIG. 8 , FIG. 8 shows a schematic structural diagram of a multi-task prediction apparatus provided by an embodiment of the present specification.

如图8所示，该多任务预测装置800应用于多任务模型，多任务模型包括M个子模型，每个子模型分别对应一个任务，每个子模型分别包括一个注意力模块，该多任务预测装置800至少可以包括第三处理模块801、第四处理模块802以及预测模块803，其中：As shown in FIG. 8 , the multi-task prediction device 800 is applied to a multi-task model. The multi-task model includes M sub-models, each sub-model corresponds to a task, and each sub-model includes an attention module. The multi-task prediction device 800 It may at least include a third processing module 801, a fourth processing module 802 and a prediction module 803, wherein:

第三处理模块801，用于确定目标特征在第m个子模型的注意力模块中的参数权重；其中，m为小于M的正整数；The third processing module 801 is used to determine the parameter weight of the target feature in the attention module of the mth sub-model; wherein, m is a positive integer smaller than M;

第四处理模块802，用于根据第m个子模型的注意力模块中的参数权重确定第m+1个子模型的注意力模块中的参数；其中，目标特征包括目标用户特征以及目标产品特征；The fourth processing module 802 is used to determine the parameters in the attention module of the m+1 th sub-model according to the parameter weight in the attention module of the m th sub-model; wherein, the target features include target user features and target product features;

预测模块803，用于基于目标特征以及第m+1个子模型的注意力模块中的参数，得到第m+1个子模型对应任务的预测结果。The prediction module 803 is configured to obtain the prediction result of the task corresponding to the m+1 th sub-model based on the target feature and the parameters in the attention module of the m+1 th sub-model.

请参阅图9，图9示出了本说明书实施例提供的又一种多任务模型训练装置的结构示意图。Please refer to FIG. 9. FIG. 9 shows a schematic structural diagram of another multi-task model training apparatus provided by an embodiment of the present specification.

该多任务模型包括M个子模型，每个子模型分别对应一个任务，每个子模型分别包括一个注意力模块。如图9所示，该多任务模型训练装置900还可以包括：至少一个处理器901、至少一个网络接口904、用户接口903、存储器905以及至少一个通信总线902。The multi-task model includes M sub-models, each sub-model corresponds to a task, and each sub-model includes an attention module. As shown in FIG. 9 , the multi-task model training apparatus 900 may further include: at least one processor 901 , at least one network interface 904 , user interface 903 , memory 905 and at least one communication bus 902 .

其中，通信总线902可用于实现上述各个组件的连接通信。Wherein, the communication bus 902 can be used to realize the connection and communication of the above components.

其中，用户接口903可以包括按键，可选用户接口还可以包括标准的有线接口、无线接口。The user interface 903 may include buttons, and the optional user interface may also include a standard wired interface and a wireless interface.

其中，网络接口904可以但不局限于包括蓝牙模块、NFC模块、Wi-Fi模块等。The network interface 904 may include, but is not limited to, a Bluetooth module, an NFC module, a Wi-Fi module, and the like.

其中，处理器901可以包括一个或者多个处理核心。处理器901利用各种接口和线路连接整个多任务模型训练装置900内的各个部分，通过运行或执行存储在存储器905内的指令、程序、代码集或指令集，以及调用存储在存储器905内的数据，执行多任务模型训练装置900的各种功能和处理数据。可选的，处理器901可以采用DSP、FPGA、PLA中的至少一种硬件形式来实现。处理器901可集成CPU、GPU和调制解调器等中的一种或几种的组合。其中，CPU主要处理操作系统、用户界面和应用程序等；GPU用于负责显示屏所需要显示的内容的渲染和绘制；调制解调器用于处理无线通信。可以理解的是，上述调制解调器也可以不集成到处理器901中，单独通过一块芯片进行实现。The processor 901 may include one or more processing cores. The processor 901 uses various interfaces and lines to connect various parts of the entire multitasking model training device 900, and by running or executing the instructions, programs, code sets or instruction sets stored in the memory 905, and calling the data, perform various functions of the multi-task model training apparatus 900 and process data. Optionally, the processor 901 may be implemented in at least one hardware form among DSP, FPGA, and PLA. The processor 901 may integrate one or a combination of a CPU, a GPU, a modem, and the like. Among them, the CPU mainly handles the operating system, user interface, and application programs; the GPU is used to render and draw the content that needs to be displayed on the display screen; the modem is used to handle wireless communication. It can be understood that, the above-mentioned modem may not be integrated into the processor 901, and is implemented by a single chip.

其中，存储器905可以包括RAM，也可以包括ROM。可选的，该存储器905包括非瞬时性计算机可读介质。存储器905可用于存储指令、程序、代码、代码集或指令集。存储器905可包括存储程序区和存储数据区，其中，存储程序区可存储用于实现操作系统的指令、用于至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现上述各个方法实施例的指令等；存储数据区可存储上面各个方法实施例中涉及到的数据等。存储器905可选的还可以是至少一个位于远离前述处理器901的存储装置。如图9所示，作为一种计算机存储介质的存储器905中可以包括操作系统、网络通信模块、用户接口模块以及多任务模型训练应用程序。The memory 905 may include RAM or ROM. Optionally, the memory 905 includes non-transitory computer-readable media. Memory 905 may be used to store instructions, programs, codes, sets of codes, or sets of instructions. The memory 905 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playback function, an image playback function, etc.), Instructions and the like used to implement the above method embodiments; the storage data area may store the data and the like involved in the above method embodiments. The memory 905 can optionally also be at least one storage device located away from the aforementioned processor 901 . As shown in FIG. 9 , the memory 905 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a multi-task model training application program.

具体地，处理器901可以用于调用存储器905中存储的多任务模型训练应用程序，并具体执行以下操作：Specifically, the processor 901 can be used to call the multi-task model training application program stored in the memory 905, and specifically perform the following operations:

处理器901确定样本特征在第m个子模型的注意力模块中的参数权重时，具体执行：When the processor 901 determines the parameter weight of the sample feature in the attention module of the mth sub-model, it specifically executes:

处理器901根据第m个子模型的注意力模块中的参数权重确定第m+1个子模型的注意力模块中的参数时，具体执行：When the processor 901 determines the parameters in the attention module of the m+1 th sub-model according to the parameter weights in the attention module of the m th sub-model, it specifically executes:

将第m个子模型的注意力模块中的参数权重输入至全连接模块，得到包含第m个子模型的注意力模块中的参数权重的连接信息；Input the parameter weights in the attention module of the mth sub-model to the fully connected module, and obtain the connection information including the parameter weights in the attention module of the mth sub-model;

将连接信息输入至第m+1个子模型的注意力模块，得到第m+1个子模型的注意力模块中的参数。Input the connection information to the attention module of the m+1 th sub-model, and get the parameters in the attention module of the m+1 th sub-model.

处理器901基于样本特征、第m个子模型的注意力模块中的参数权重以及第m+1个子模型的注意力模块中的参数对多任务模型进行训练时，具体执行：When the processor 901 trains the multi-task model based on the sample features, the parameter weights in the attention module of the m th sub-model, and the parameters in the attention module of the m+1 th sub-model, it specifically executes:

在一些可能的实施例中，处理器901根据第二转换信息以及第m+1个子模型的参数得到第m+1个子模型对应的任务的预测结果之后，还用于执行：In some possible embodiments, after obtaining the prediction result of the task corresponding to the m+1 th sub-model according to the second conversion information and the parameters of the m+1 th sub-model, the processor 901 is further configured to execute:

在一些可能的实施例中，处理器901在根据第m+1个子模型对应的任务的预测结果以及样本特征，得到第一损失函数之后，在基于第一损失函数对第m+1个子模型进行优化之前，还用于执行：In some possible embodiments, after the processor 901 obtains the first loss function according to the prediction result of the task corresponding to the m+1 th sub-model and the sample characteristics, the m+1 th sub-model performs the calculation on the m+1 th sub-model based on the first loss function Before optimization, also used to perform:

在一些可能的实施例中，处理器901根据第一转换信息以及第m个子模型的权重参数得到第m个子模型对应的任务的预测结果之后，还用于执行：In some possible embodiments, after the processor 901 obtains the prediction result of the task corresponding to the m-th sub-model according to the first conversion information and the weight parameter of the m-th sub-model, the processor 901 is further configured to execute:

请参阅图10，图10示出了本说明书实施例提供的又一种多任务预测装置的结构示意图。Please refer to FIG. 10. FIG. 10 shows a schematic structural diagram of another multi-task prediction apparatus provided by an embodiment of the present specification.

该多任务预测装置1000应用于多任务模型，该多任务模型包括M个子模型，每个子模型分别对应一个任务，每个子模型分别包括一个注意力模块。如图10所示，该多任务预测装置1000还可以包括：至少一个处理器1001、至少一个网络接口1004、用户接口1003、存储器1005以及至少一个通信总线1002。The multi-task prediction apparatus 1000 is applied to a multi-task model. The multi-task model includes M sub-models, each sub-model corresponds to a task, and each sub-model includes an attention module. As shown in FIG. 10 , the multi-task prediction apparatus 1000 may further include: at least one processor 1001 , at least one network interface 1004 , user interface 1003 , memory 1005 and at least one communication bus 1002 .

其中，通信总线1002可用于实现上述各个组件的连接通信。Wherein, the communication bus 1002 can be used to realize the connection and communication of the above components.

其中，用户接口1003可以包括按键，可选用户接口还可以包括标准的有线接口、无线接口。The user interface 1003 may include buttons, and the optional user interface may also include a standard wired interface and a wireless interface.

其中，网络接口1004可以但不局限于包括蓝牙模块、NFC模块、Wi-Fi模块等。Wherein, the network interface 1004 may include, but is not limited to, a Bluetooth module, an NFC module, a Wi-Fi module, and the like.

其中，处理器1001可以包括一个或者多个处理核心。处理器1001利用各种接口和线路连接整个多任务预测装置1000内的各个部分，通过运行或执行存储在存储器1005内的指令、程序、代码集或指令集，以及调用存储在存储器1005内的数据，执行多任务预测装置1000的各种功能和处理数据。可选的，处理器1001可以采用DSP、FPGA、PLA中的至少一种硬件形式来实现。处理器1001可集成CPU、GPU和调制解调器等中的一种或几种的组合。其中，CPU主要处理操作系统、用户界面和应用程序等；GPU用于负责显示屏所需要显示的内容的渲染和绘制；调制解调器用于处理无线通信。可以理解的是，上述调制解调器也可以不集成到处理器1001中，单独通过一块芯片进行实现。The processor 1001 may include one or more processing cores. The processor 1001 uses various interfaces and lines to connect various parts of the entire multitasking prediction apparatus 1000, and by running or executing the instructions, programs, code sets or instruction sets stored in the memory 1005, and calling the data stored in the memory 1005. , which executes various functions of the multitasking prediction apparatus 1000 and processes data. Optionally, the processor 1001 may be implemented in at least one hardware form among DSP, FPGA, and PLA. The processor 1001 may integrate one or a combination of a CPU, a GPU, a modem, and the like. Among them, the CPU mainly handles the operating system, user interface, and application programs; the GPU is used to render and draw the content that needs to be displayed on the display screen; the modem is used to handle wireless communication. It can be understood that, the above-mentioned modem may not be integrated into the processor 1001, but is implemented by a single chip.

其中，存储器1005可以包括RAM，也可以包括ROM。可选的，该存储器1005包括非瞬时性计算机可读介质。存储器1005可用于存储指令、程序、代码、代码集或指令集。存储器1005可包括存储程序区和存储数据区，其中，存储程序区可存储用于实现操作系统的指令、用于至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现上述各个方法实施例的指令等；存储数据区可存储上面各个方法实施例中涉及到的数据等。存储器1005可选的还可以是至少一个位于远离前述处理器1001的存储装置。如图10所示，作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及多任务预测应用程序。The memory 1005 may include RAM or ROM. Optionally, the memory 1005 includes non-transitory computer-readable media. Memory 1005 may be used to store instructions, programs, codes, sets of codes, or sets of instructions. The memory 1005 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playback function, an image playback function, etc.), Instructions and the like used to implement the above method embodiments; the storage data area may store the data and the like involved in the above method embodiments. Optionally, the memory 1005 may also be at least one storage device located away from the aforementioned processor 1001 . As shown in FIG. 10 , the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module and a multitasking prediction application program.

具体地，处理器1001可以用于调用存储器1005中存储的多任务预测应用程序，并具体执行以下操作：Specifically, the processor 1001 can be used to call the multitasking prediction application program stored in the memory 1005, and specifically perform the following operations:

本说明书实施例还提供了一种计算机可读存储介质，该计算机可读存储介质中存储有指令，当其在计算机或处理器上运行时，使得计算机或处理器执行上述图3或图6所示实施例中的一个或多个步骤。上述电子设备的各组成模块如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在所述计算机可读取存储介质中。The embodiments of the present specification also provide a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, when the computer or the processor is executed, the computer or the processor causes the computer or the processor to execute the steps shown in FIG. 3 or FIG. 6 above. one or more steps in an example embodiment. If each component module of the above electronic device is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in the computer-readable storage medium.

在上述实施例中，可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时，可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时，全部或部分地产生按照本说明书实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中，或者通过所述计算机可读存储介质进行传输。所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DigitalSubscriber Line，DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质，(例如，软盘、硬盘、磁带)、光介质(例如，数字多功能光盘(DigitalVersatile Disc，DVD))、或者半导体介质(例如，固态硬盘(Solid State Disk，SSD))等。In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present specification are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted over a computer-readable storage medium. The computer instructions can be sent from one website site, computer, server, or data center to another by wire (eg, coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.) A website site, computer, server or data center for transmission. The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes an integration of one or more available media. The available media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, Digital Versatile Disc (DVD)), or semiconductor media (eg, Solid State Disk (SSD) ))Wait.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，可以通过计算机程序来指令相关的硬件来完成，该程序可存储于计算机可读取存储介质中，该程序在执行时，可包括如上述各方法的实施例的流程。而前述的存储介质包括：ROM、RAM、磁碟或者光盘等各种可存储程序代码的介质。在不冲突的情况下，本实施例和实施方案中的技术特征可以任意组合。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing the relevant hardware through a computer program, and the program can be stored in a computer-readable storage medium. When the program is executed, The processes of the embodiments of the various methods described above may be included. The aforementioned storage medium includes various media that can store program codes, such as ROM, RAM, magnetic disk, or optical disk. The technical features in this embodiment and the implementation can be combined arbitrarily if there is no conflict.

以上所述的实施例仅仅是本说明书的优选实施例方式进行描述，并非对本说明书的范围进行限定，在不脱离本说明书的设计精神的前提下，本领域普通技术人员对本说明书的技术方案作出的各种变形及改进，均应落入本说明书的权利要求书确定的保护范围内。The above-mentioned embodiments are only the preferred embodiments of this specification to describe, and do not limit the scope of this specification. Without departing from the design spirit of this specification, those of ordinary skill in the art can make technical solutions of this specification. Various modifications and improvements shall fall within the protection scope determined by the claims of this specification.

Claims

1. A multi-task model training method, the multi-task model comprises M sub-models, each of the sub-models corresponds to a task respectively, and each of the sub-models comprises an attention module respectively, and the method comprises:

Determine the parameter weight of the sample feature in the attention module of the mth sub-model; where m is a positive integer less than M;

The parameters in the attention module of the m+1 th sub-model are determined according to the parameter weights in the attention module of the m th sub-model; wherein, the sample features include sample user features, sample product features, and the user’s preference for the m+ th sub-model. Sample results of tasks corresponding to 1 sub-model;

The multi-task model is trained based on the sample features, parameter weights in the attention module of the m-th sub-model, and parameters in the attention module of the m+1-th sub-model.

2. The method according to claim 1, the m-th sub-model further comprises an embedding module, a combining module and a first converting module;

The determining the parameter weight of the sample feature in the attention module of the mth sub-model includes:

Inputting the sample user features and the sample product features into the embedding module to obtain feature vectors corresponding to the sample user features and the sample product features respectively;

Inputting the feature vectors corresponding to the sample user features and the sample product features respectively into the combining module to obtain a combined feature vector;

Inputting the combined feature vector to the first conversion module to obtain first conversion information;

The first conversion information is input into the attention module of the m-th sub-model, and the parameter weight of the sample feature in the m-th sub-model is obtained.

3. The method according to claim 1, a fully connected module is provided between the attention module of the m th sub-model and the attention module of the m+1 th sub-model;

The parameters in the attention module of the m+1 th sub-model are determined according to the parameter weights in the attention module of the m th sub-model, including:

inputting the parameter weights in the attention module of the mth sub-model to the fully connected module, to obtain connection information including the parameter weights in the attention module of the mth sub-model;

The connection information is input into the attention module of the m+1 th sub-model to obtain the parameters in the attention module of the m+1 th sub-model.

4. The method according to claim 2, wherein the m+1 th sub-model further comprises the embedding module, the combining module and the second converting module;

The multi-task model is trained based on the sample features, the parameter weights in the attention module of the m th sub-model, and the parameters in the attention module of the m+1 th sub-model, including:

Inputting the feature vectors respectively corresponding to the sample user features and the sample product features into the combination module to obtain the combined feature vector;

Inputting the combined feature vector to the second conversion module to obtain second conversion information;

Inputting the second conversion information into the attention module of the m+1th sub-model, and obtaining the second conversion information and parameters in the attention module of the m+1th sub-model The prediction results of tasks corresponding to m+1 sub-models;

The multi-task model is trained according to the prediction result of the task corresponding to the m+1 th sub-model and the user's sample result of the task corresponding to the m+1 th sub-model.

5. The method according to claim 4, wherein after obtaining the prediction result of the task corresponding to the m+1 th sub-model according to the second conversion information and the parameters of the m+1 th sub-model, the Methods also include:

Obtain a first loss function according to the prediction result of the task corresponding to the m+1 th sub-model and the sample feature;

The m+1 th sub-model is optimized based on the first loss function.

6. The method according to claim 5, wherein after the first loss function is obtained according to the prediction result of the task corresponding to the m+1 th sub-model and the sample feature, the first loss function is obtained based on the first loss function. Before optimizing the m+1 th sub-model, the method further includes:

Judging whether the prediction result of the task corresponding to the m+1 th sub-model satisfies a preset condition;

When it is determined that the prediction result of the task corresponding to the m+1 th sub-model does not meet the preset condition, obtain the The prediction results of the tasks corresponding to the m sub-models;

According to the prediction result of the task corresponding to the m th sub-model, the prediction result of the task corresponding to the m+1 th sub-model, and the sample feature, a second loss function is obtained;

The optimizing the m+1 th sub-model based on the first loss function includes:

The m+1 th sub-model is optimized based on the first loss function and the second loss function.

7. The method according to claim 6, after obtaining the prediction result of the task corresponding to the m-th sub-model according to the first conversion information and the weight parameter of the m-th sub-model, the method further comprises: :

According to the prediction result of the task corresponding to the m th sub-model and the sample feature, a third loss function is obtained;

The m-th sub-model is optimized based on the third loss function.

8. A multi-task prediction method, the method is applied to a multi-task model, the multi-task model includes M sub-models, each of the sub-models corresponds to a task respectively, and each of the sub-models includes an attention module, the method includes:

Determine the parameter weight of the target feature in the attention module of the mth sub-model; where m is a positive integer less than M;

Determine the parameters in the attention module of the m+1 th sub-model according to the parameter weights in the attention module of the m th sub-model; wherein, the target features include target user features and target product features;

Based on the target feature and the parameters in the attention module of the m+1 th sub-model, the prediction result of the task corresponding to the m+1 th sub-model is obtained.

9. A multi-task model training device, the multi-task model comprises M sub-models, each of the sub-models corresponds to a task respectively, and each of the sub-models comprises an attention module respectively, and the The training device includes:

The first processing module is used to determine the parameter weight of the sample feature in the attention module of the mth sub-model; wherein, m is a positive integer smaller than M;

The second processing module is configured to determine the parameters in the attention module of the m+1 th sub-model according to the parameter weight in the attention module of the m th sub-model; wherein, the sample features include sample user features, sample products Features and the user's sample results for the task corresponding to the m+1th sub-model;

A training module, configured to train the multi-task model based on the sample features, parameter weights in the attention module of the mth sub-model, and parameters in the attention module of the m+1th sub-model.

10. A multi-task prediction device, the device is applied to a multi-task model, the multi-task model includes M sub-models, each of the sub-models corresponds to a task, and each of the sub-models includes an attention module, the device includes:

The third processing module is used to determine the parameter weight of the target feature in the attention module of the mth sub-model; wherein, m is a positive integer smaller than M;

The fourth processing module is used to determine the parameters in the attention module of the m+1 th sub-model according to the parameter weight in the attention module of the m th sub-model; wherein, the target features include target user features and target products feature;

A prediction module, configured to obtain the prediction result of the task corresponding to the m+1 th sub-model based on the target feature and the parameters in the attention module of the m+1 th sub-model.

11. A multi-task model training device, comprising a processor and a memory;

the processor is connected to the memory;

the memory for storing executable program codes;

The processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, so as to execute the method according to any one of claims 1-7.

12. A multi-task prediction device, comprising a processor and a memory;

the processor is connected to the memory;

the memory for storing executable program codes;

The processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory for performing the method of claim 8 .

13. A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the method according to any one of claims 1-8 is implemented.