CN114781620A

CN114781620A - Data processing model construction method, device, equipment and storage medium

Info

Publication number: CN114781620A
Application number: CN202210479401.4A
Authority: CN
Inventors: 茆廷志; 万根顺; 高建清; 刘聪; 胡国平; 刘庆峰; 胡郁
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2022-04-26
Filing date: 2022-04-26
Publication date: 2022-07-22

Abstract

The application provides a data processing model construction method, a device, equipment and a storage medium, wherein the method comprises the following steps: determining an operator which is not supported by a second operation main body from a first data processing model which is operated on a first operation main body, and taking the operator as an operator to be optimized; the operator supported by the second operation main body is utilized to construct an optimization operator corresponding to the operator to be optimized, so that the optimization operator can realize the same function as the corresponding operator to be optimized and can be supported by the second operation main body; and replacing the operator to be optimized in the first data processing model with the optimization operator to obtain a second data processing model, so that the second data processing model and the first data processing model have the same function and frame and can be deployed to a second operation main body, the data processing model with the same function and frame can be deployed to the operation main bodies with different computing resources, the practicability of the data processing model is improved, and the development pressure on the data processing model is relieved.

Description

Data processing model construction method, device, equipment and storage medium

Technical Field

The present application relates to the field of model construction technologies, and in particular, to a method, an apparatus, a device, and a storage medium for constructing a data processing model.

Background

Data processing services typically deploy a pre-built data processing model (e.g., a neural network model) to a runtime agent to enable the runtime agent to utilize the data processing model to implement the data processing service. Different execution agents have different hardware resources and processing capabilities, and therefore, the same data processing model may not be deployed to different execution agents at the same time.

For example, in a speech recognition scenario, due to the rich computing resources of the cloud, speech recognition services are usually implemented by deploying a speech recognition system in the cloud. However, it is desirable to deploy speech recognition services on the device side (i.e., the end-side) due to factors such as user privacy, bandwidth delay, network instability, etc. However, the computing resources of the device side are significantly different from those of the cloud side, and operators of the speech recognition system running in the cloud side may not be supported in the device side, so that the speech recognition system in the cloud side cannot be deployed to the device side application, and the speech recognition system needs to be redeveloped for the device side.

Therefore, data processing models with the same function and framework cannot be deployed to operation subjects with different computing resources, so that the practicability of the data processing models is low, and the development pressure on the data processing models cannot be relieved.

Disclosure of Invention

Based on the defects and shortcomings of the prior art, the data processing model construction method, the data processing model construction device, the data processing model construction equipment and the storage medium are provided, the practicability of the data processing model can be improved, and the development pressure on the data processing model is relieved.

A data processing model construction method comprises the following steps:

determining an operator to be optimized from a first data processing model running on a first running main body; the operator to be optimized is an operator which is not supported by the second operation main body;

an operator supported by the second operation main body is utilized to construct an optimization operator corresponding to the operator to be optimized, and the optimization operator can realize the same function as the corresponding operator to be optimized;

and replacing the operator to be optimized in the first data processing model by using the optimization operator to obtain a second data processing model.

Optionally, the determining an operator to be optimized from a first data processing model running on a first running main body includes:

operator structure disassembly is carried out on the first data processing model, and all operators in the first data processing model are determined;

and analyzing all operators in the first data processing model according to the computing resources of the second operation main body, and determining the operators which are not supported by the second operation main body in the first data processing model as the operators to be optimized.

Optionally, the step of constructing an optimization operator corresponding to the operator to be optimized by using the operator supported by the second operation subject includes:

and constructing and obtaining an optimization operator corresponding to the operator to be optimized by combining or optimizing the operators supported by the second operation main body.

Optionally, the constructing and obtaining an optimization operator corresponding to the operator to be optimized by combining or optimizing the operators supported by the second operation subject includes:

determining a first type of operator to be optimized and a second type of operator to be optimized from each operator to be optimized; the first type operator to be optimized is an operator which is not supported by the second operation main body, and the second type operator to be optimized is an operator which has a different data processing mode from an operator of the same type supported by the second operation main body;

the operators supported by the second operation main body are combined to construct an operator with the same function as the first type of operator to be optimized, and an optimization operator corresponding to the first type of operator to be optimized is obtained;

and optimizing an operator which is supported by the second operation main body and has the same type as the operator to be optimized of the second type to obtain an optimization operator corresponding to the operator to be optimized of the second type.

Optionally, the first type of operator to be optimized is an attention mechanism operator or a layer normalization operator;

the step of constructing an operator with the same function as the first type of operator to be optimized by combining operators supported by the second operation main body to obtain an optimization operator corresponding to the first type of operator to be optimized includes:

if the first type of operator to be optimized is an attention mechanism operator, constructing by using a neural network, a batch normalization operator and an activation function which are matched with the functions of the attention mechanism operator to obtain a semi-optimization operator corresponding to the attention mechanism operator;

integrating batch normalization operators in the semi-optimization operators into convolution operators to obtain optimization operators corresponding to the attention mechanism operators;

or,

if the first type operator to be optimized is a layer normalization operator, integrating batch normalization operators matched with the layer normalization operator into a convolution operator to obtain an optimization operator with the same function as the layer normalization operator;

wherein the neural network, the convolution operator, and the activation function are all operators supported by the second operational agent.

Optionally, the second type of operator to be optimized is an asymmetric convolution operator;

the obtaining of the optimization operator corresponding to the second type of operator to be optimized by optimizing the operator of the same type as the second type of operator to be optimized, which is supported by the second operation subject, includes:

adding a data dimension processing module to the symmetric convolution operator supported by the second operation main body to obtain an optimization operator corresponding to the asymmetric convolution operator;

the data dimension processing module is configured to adjust an input data dimension of the symmetric convolution operator, so that the input data dimension meets the requirement of the symmetric convolution operator.

Optionally, the method further includes:

training the second data processing model.

Optionally, training the second data processing model includes:

acquiring model training data and a data processing reference model corresponding to the second data processing model;

performing multi-task joint training on the second data processing model and the data processing reference model by using the model training data;

and the data processing reference model is a data processing model with performance higher than that of the second data processing model.

Optionally, the model training data is speech recognition training data;

the performing multi-task joint training on the second data processing model and the data processing reference model by using the model training data includes:

inputting the speech recognition training data into the data processing reference model and the second data processing model, respectively, so that the data processing reference model performs non-streaming recognition on the speech recognition training data, and so that the second data processing model performs streaming recognition on the speech recognition training data;

calculating a first loss function corresponding to the second data processing model and a second loss function corresponding to the data processing reference model;

and utilizing the first loss function and the second loss function to carry out parameter adjustment on the second data processing model until the first loss function and the second loss function are within a preset loss range.

Optionally, before performing the multitask joint training on the second data processing model and the data processing reference model by using the model training data, the method further includes:

and carrying out speaker information normalization processing on the voice recognition training data.

A data processing model construction apparatus comprising:

the operator determining unit is used for determining an operator to be optimized from a first data processing model running on the first running main body; the operator to be optimized is an operator which is not supported by the second operation main body;

the operator construction unit is used for constructing an optimization operator corresponding to the operator to be optimized by using an operator supported by the second operation main body, and the optimization operator can realize the same function as the corresponding operator to be optimized;

and the operator replacing unit is used for replacing the operator to be optimized in the first data processing model by using the optimization operator to obtain a second data processing model.

A data processing model building apparatus comprising:

a memory and a processor;

wherein the memory is connected with the processor and used for storing programs;

the processor is used for realizing the data processing model construction method by operating the program in the memory.

A storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described data processing model construction method.

When the data processing model is constructed, an operator which is not supported by a second operation main body is determined from a first data processing model which operates on a first operation main body and is used as an operator to be optimized, then an operator supported by the second operation main body is utilized to construct an optimization operator corresponding to the operator to be optimized, so that the optimization operator can realize the same function as the corresponding operator to be optimized and can be supported by a second operation main body, and finally the operator to be optimized in the first data processing model is replaced by the optimization operator to obtain a second data processing model, so that the second data processing model and the first data processing model have the same function and frame and can be deployed to the second operation main body, and the data processing model with the same function and frame can be deployed to the operation main bodies with different computing resources, the practicability of the data processing model is improved, and the development pressure on the data processing model is relieved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only the embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a schematic flowchart of a data processing model construction method provided in an embodiment of the present application;

FIG. 2 is a schematic processing flow diagram for determining an operator to be optimized in a first data processing model according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a multi-head attention mechanism module in a Conformer structure according to an embodiment of the present disclosure;

fig. 4 is a schematic processing flow diagram for constructing an optimization operator corresponding to an operator to be optimized according to the embodiment of the present application;

FIG. 5 is a block diagram of a multi-headed sequence memory mechanism module according to an embodiment of the present disclosure;

FIG. 6 is a flowchart illustrating a process of training a second data processing model according to an embodiment of the present application;

FIG. 7 is a process flow diagram of multi-task joint training provided by an embodiment of the present application;

FIG. 8 is a block diagram of a multi-task joint training framework provided by an embodiment of the present application;

fig. 9 is a schematic structural diagram of a data processing model building apparatus according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a data processing model building device according to an embodiment of the present application.

Detailed Description

The technical scheme of the embodiment of the application is suitable for application scenes constructed by data processing models. By adopting the technical scheme of the embodiment of the application, the data processing model with the same function and framework can be deployed to the operation main bodies with different computing resources, the practicability of the data processing model is improved, and the development pressure on the data processing model is relieved.

The application of the data processing service generally requires that a data processing model, such as a neural network model, is constructed in advance, and then the data processing model is deployed on a corresponding operation subject, so that the operation subject can implement the data processing service by using the data processing model.

For example, in a voice recognition scenario, a currently popular former structure can be used to construct a voice recognition system, and the voice recognition system is deployed to a cloud, so that a cloud-based voice recognition service is realized. The cloud end has rich computing resources and no limitation of a hardware operation interface, so that the cloud end can deploy a complex recognition system and provide high recognition accuracy. However, the voice recognition service at the cloud end needs to collect the voice data of the user and transmit the voice data to the cloud end for decoding, so that the situation of bandwidth delay or unstable network may occur, the voice recognition efficiency is affected, and the situation of user privacy disclosure may also occur, and the user experience is affected.

Therefore, it is desirable to deploy speech recognition services on the device side (i.e., the end-side) in view of factors such as user privacy, bandwidth delay, network instability, etc. However, hardware resources of the device end are limited, which causes that computing resources of the device end have significant differences relative to the cloud end, the device end may not completely support an operator in the voice recognition system running in the cloud end, so that the voice recognition system deployed in the cloud end cannot be deployed to the device end, and the voice recognition system can only be re-developed for the device end, which increases development pressure of the voice recognition system.

At present, different operation subjects have different hardware resources and processing performances, so that the same data processing model may not be deployed to different operation subjects at the same time, resulting in lower practicability of the data processing model. In order to deploy the data processing model in a different operational principal, the data processing model is redeveloped for the different operational principal, resulting in an increase in the development pressure of the data processing model.

In view of the above deficiencies of the prior art and the actual requirements for constructing a data processing model which has the same function and frame and can be deployed on different operation subjects with different computing resources, the inventors of the present application have made research and experiments to provide a data processing model construction method which can implement the deployment of the data processing model with the same function and frame on the operation subjects with different computing resources, improve the practicability of the data processing model, and relieve the development pressure on the data processing model.

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

An embodiment of the present application provides a data processing model building method, which may be exemplarily applied to data processing devices such as a server and an intelligent terminal, and is shown in fig. 1, where the method includes:

s101, determining an operator to be optimized from a first data processing model operated on a first operation main body.

Specifically, the execution agent refers to a processing device capable of executing the data processing model to provide a data processing service, and may specifically be a server, a device, an apparatus, and the like, with the same or different performance or computing resources. The first operation subject is disposed with a first data processing model, and the data processing service of the first operation subject can be realized by using the first data processing model. The first data processing model may be various models that implement data processing functions, such as a speech recognition model, a speech translation model, a speech synthesis model, and the like.

In this embodiment, the computing resources of the first operation subject are superior to those of the second operation subject, and the data processing model built based on the computing resources of the first operation subject may not be suitable for the second operation subject, that is, an operator in the first data processing model operated in the first operation subject may not be supported by the second operation subject, so that the first data processing model cannot be deployed to the second operation subject. The operator is not supported by the second operation subject, and it may be that the storage capacity and/or the computing power of the second operation subject are/is not enough to support the operation of the operator. In order to deploy a model with the same function and framework as those of the first data processing model in the second operation subject, operators which are not supported by the second operation subject need to be found out from all operators in the first data processing model according to the computing resources of the second operation subject, and the operators which are not supported by the second operation subject are used as the operators to be optimized.

For example, in a speech recognition scenario, the first execution subject may be a cloud, the second execution subject may be a device, and the first data processing model may be a speech recognition system constructed by using a Conformer structure. Since the deployed speech recognition system in the cloud is in a former structure, and the device side does not completely support all operators in the former structure, operators that are not supported by the device side in the former structure need to be determined according to the computing resources of the device side. In addition, the first operation body and the second operation body may be two device ends with different hardware resources, and since the hardware resources of the two device ends are different, the speech recognition system of one device end may not be deployed to the other device end, and at this time, it is necessary to determine an operator that is not supported by the other device end among all operators of the speech recognition system.

S102, an operator supported by the second operation main body is utilized to construct an optimization operator corresponding to the operator to be optimized, and the optimization operator can realize the same function as the corresponding operator to be optimized.

Specifically, because different operation subjects have different computing resources, operators supported by different operation subjects are different, and in order to ensure that the data processing model constructed for the second operation subject has the same function and framework as the first data processing model, an operator not supported by the second operation subject in the first data processing model can be replaced by an operator supported by the second operation subject, and at the same time, the operator still has the same function after replacement. Therefore, operators supported by the second execution principal need to be determined according to the computing resources of the second execution principal. And then, an operator with the same function as the operator to be optimized is constructed by using the operator supported by the second operation main body and is used as the optimization operator corresponding to the operator to be optimized.

Different operators to be optimized have different functions and different manners for constructing the corresponding operators to be optimized, so that when the operators supported by the second operation main body are used for constructing the operators to be optimized, the construction manner required by each operator to be optimized for constructing the corresponding operator to be optimized needs to be determined, the operator matched with the operator to be optimized is selected from the operators supported by the second operation main body, and then the corresponding operator to be optimized is constructed by using the corresponding construction manner. The operator matched with the operator to be optimized is selected from the operators supported by the second operation main body according to the function of the operator to be optimized, so that the constructed optimization operator and the operator to be optimized corresponding to the optimization operator can realize the same function.

For example, in a speech recognition scenario, a first operating subject (i.e., a cloud) operates a speech recognition model using a former structure, and operators supported by a second operating subject (i.e., a device side) are required to optimize operators to be optimized in the former structure, which are not supported by the second operating subject, so as to obtain optimization operators corresponding to the respective operators to be optimized, so that the optimization operators can be supported by the second operating subject (i.e., the device side).

S103, replacing the operator to be optimized in the first data processing model with the optimization operator to obtain a second data processing model.

Specifically, after the optimization operator corresponding to each operator to be optimized is constructed, each optimization operator needs to be used to replace the operator to be optimized corresponding to each optimization operator in the first data processing model, and the model after operator replacement is used as the second data processing model. Because the second data processing model only replaces some operators in the first data processing model with other operators capable of realizing the same function, the framework of the model is not changed, and the function is not changed, the framework and the function of the first data processing model and the second data processing model are the same, and no operator which is not supported by the second operation main body exists in the second data processing model, the computing resources of the second operation main body can support the operation of the second data processing model, the second data processing model can be deployed in the second operation main body, and the data processing model with the same function and framework can be deployed to operation main bodies with different computing resources.

When the first operation main body is a cloud end and the second operation main body is an equipment end, computing resources of the cloud end are richer than those of the equipment end, operators supported by the equipment end are generally supported by the equipment end, and the cloud end also supports the operators, so that after a second data processing model which can be deployed at the equipment end is constructed, the second data processing model can be deployed at the cloud end, model sharing between the cloud end and the equipment end is realized, and therefore, when the second data processing model is optimized and updated, the same updating strategy can be adopted, development and updating pressure of the data processing model is reduced, and iteration frequency of the model can be improved.

For example, in a speech recognition scenario, replacing each constructed optimization operator with an operator to be optimized corresponding to each optimization operator in the transformer structure, so that all operators in the transformer structure after operator replacement are operators supported by a second operation subject (i.e., a device end), thereby obtaining a speech recognition model of the transformer structure after operator replacement. Because the speech recognition model after operator replacement and the speech recognition model before operator replacement both adopt the former structure and only change a few operators, the frame of the speech recognition model is not changed.

In addition, when the second data processing model is constructed, in order to reduce the dependence on the computing resources of the equipment end, the operation main body with the most basic computing resources and the worst performance is used as the second operation main body, so that operators in the constructed second data processing model are all basic operators in the equipment with the same type as the second operation main body, the equipment end with the larger computing resource limitation can deploy the second data processing model, other equipment ends with the smaller computing resource limitation than the equipment end can also deploy the second data processing model inevitably, the model does not need to be reconstructed for other equipment ends according to the computing resources of other equipment ends, model sharing among equipment with different performances or different computing resource quality levels is realized, and development and updating pressure of the data processing model is relieved.

As can be seen from the above description, when the data processing model is constructed, the data processing model construction method provided in the embodiment of the present application determines an operator that is not supported by the second operating body from the first data processing model operating on the first operating body, and uses the operator supported by the second operating body to construct an optimization operator corresponding to the operator to be optimized, so that the optimization operator can realize the same function as the corresponding operator to be optimized and can be supported by the second operating body, and finally replaces the operator to be optimized in the first data processing model with the optimization operator to obtain the second data processing model, so that the second data processing model and the first data processing model have the same function and frame and can be deployed to the second operating body, and the data processing model with the same function and frame can be deployed to operating bodies with different computing resources, the practicability of the data processing model is improved, and the development pressure on the data processing model is relieved.

As a preferred implementation manner, referring to fig. 2, another embodiment of the present application discloses that the determining an operator to be optimized from a first data processing model executed by a first execution main body includes:

s201, performing operator structure disassembly on the first data processing model, and determining all operators in the first data processing model.

Specifically, in order to determine all operators in the first data processing model, an operator structure needs to be disassembled from the first data processing model, and since different model structures include different modules, all modules in the first data processing model need to be disassembled first, and then an operator included in each module needs to be disassembled, so that all operators in the first data processing model can be determined.

For example, in a speech recognition scenario, a cloud-based speech recognition system generally adopts a former structure with high recognition accuracy, and the former structure is in a shape similar to a sandwich and is composed of three modules: the system comprises a feedforward neural network module, a multi-head attention mechanism module and a convolution module, wherein the feedforward neural network module, the multi-head attention mechanism module and the convolution module are respectively connected by using residual errors. In order to determine all operators in the framework, the feedforward neural network module, the multi-head attention mechanism module and the convolution module need to be disassembled respectively to determine the operators contained in each module. Wherein, the feedforward neural network module includes: linear layer operator (Linear) and activation function (Swish). The multi-head attention mechanism module comprises: linear layer operator (Linear) and Attention mechanism operator (Self-Attention). The convolution module includes: layer normalization operator (LayearNorm), one-dimensional convolution operator (Conv1D), gated linear cell operator (GLU), depth one-dimensional convolution (Depthwise Conv1D), and activation function (Swish).

The structure of the multi-head attention mechanism module is shown in fig. 3, where Feature is an input speech Feature (speech X with length T), and the number of linear layer operators in the multi-head attention mechanism module includes three, namely Query linear layer operator, Key linear layer operator, and Value linear layer operator. The voice X passes through a Query linear layer to obtain Q, passes through a Key linear layer to obtain K, and passes through a Value linear layer to obtain V, wherein Q refers to Query, K refers to Key, and V refers to Key Value. The attention mechanism calculation formula corresponding to the attention mechanism operator in the multi-head attention mechanism module is as follows:

from this formula, the attention mechanism operator can capture the entire field of view of the input speech.

S202, analyzing all operators in the first data processing model according to the computing resources of the second operation main body, and determining the operators which are not supported by the second operation main body in the first data processing model as the operators to be optimized.

Specifically, all operators in the first data processing model are analyzed, and operators which cannot be supported by the computing resources of the second operation main body are found out from the operators and serve as the operators to be optimized.

The computing resource of the second operation main body can be determined through the implementation condition of the operator operation interface, and if the operator operation interface supporting a certain operator is arranged in the second operation main body, it is indicated that the second operation main body has the computing resource corresponding to the operator. The second operation body is provided with a plurality of operator operation interfaces, such as simple operator operation interfaces of convolution, pooling, full connection and the like, when analyzing all operators in the first data processing model, all operators in the first data processing model can be matched with the operator operation interfaces in the second operation body, whether an operator operation interface supporting each operator in the first data processing model exists is inquired from the second operation body, if the operator operation interface supporting a certain operator in the first data processing model exists, the operator is successfully matched with the operator operation interface in the second operation body, the operator is an operator supported by the second operation body, and if the operator operation interface supporting a certain operator in the first data processing model does not exist, the operator is unsuccessfully matched with the operator operation interface in the second operation body, the operator is an operator that is not supported by the second runtime agent. And taking an operator which cannot be successfully matched with the operator operation interface in the second operation main body in the first data processing model as an operator to be optimized.

For example, in a speech recognition scenario, an analysis result table obtained after support analysis is performed on an operator included in each module in a former structure adopted by a cloud deployed speech recognition system is shown as follows, where table 1 is an analysis result table of an operator in a feedforward neural network module, table 2 is an analysis result table of an operator in a multi-head attention mechanism module, and table 3 is an analysis result table of an operator in a convolution module.

TABLE 1

TABLE 2

TABLE 3

From the information in the above three tables, it can be seen that Swish in the feedforward neural network module of the former structure is an activation function, and is obtained by multiplying the input and Sigmoid activation functions. And the second operation subject (device side) in the speech recognition scene supports the Sigmoid activation function, so the second operation subject (device side) in the Swish activation function speech recognition scene also supports. Therefore, the feedforward neural network module of the transformer structure does not include the operator to be optimized, and the construction of the optimization operator is not required to be performed on the feedforward neural network module of the transformer structure.

The Self-Attention operator in the multi-head Attention mechanism module of the former structure needs the product between matrixes and the operation of a Softmax activation function, while the second operation main body (equipment end) in the speech recognition scene does not support the Softmax activation function, and cannot be realized through the matrixes or full connection because the product between the matrixes cannot fix data parameters, while the second operation main body (equipment end) in the speech recognition scene only supports simple operator operation interfaces of convolution, pooling, full connection and the like, so the second operation main body (equipment end) in the speech recognition scene does not support the Self-Attention operator, and the Self-Attention operator is an operator to be optimized. The LayearNorm operator in the convolution module of the former structure is not supported by the second running agent (device side) in the speech recognition scene, and the Conv1D operator and the Depthwise Conv1D operator are both asymmetric convolution operators, while the second running agent (device side) in the speech recognition scene only supports symmetric convolution operation, so the second running agent (device side) in the speech recognition scene also does not support the Conv1D operator and the Depthwise Conv1D operator, and therefore, the LayearNorm operator, the Conv1D operator and the Depthwise Conv1D operator are all to-be-optimized operators.

As a preferred implementation manner, another embodiment of the present application discloses that the above-mentioned building, by using the operator supported by the second operation subject, to obtain the optimization operator corresponding to the operator to be optimized includes:

Specifically, different operators to be optimized have different functions, and therefore, the construction modes of the corresponding optimization operators are different. The construction mode of the optimization operator comprises a combination mode and an optimization mode. When the operator to be optimized is an operator which is not supported by the second operation main body at all, a combination mode is needed to be adopted, and a plurality of operators matched with the operator to be optimized are selected from the operators supported by the second operation main body to construct the optimization operator corresponding to the operator to be optimized. When the operator to be optimized is an operator which is the same as the operator type supported by the second operation main body but cannot be supported by the second operation main body due to different data processing modes, the optimization mode needs to be adopted to construct the optimization operator corresponding to the operator to be optimized.

As a preferred implementation manner, referring to fig. 4, another embodiment of the present application discloses that the above-mentioned combining or optimizing the operators supported by the second operation subject to construct the optimization operator corresponding to the operator to be optimized, including:

s401, determining a first type of operator to be optimized and a second type of operator to be optimized from each operator to be optimized.

Specifically, there are multiple operators of different types in all the operators to be optimized, and the operators of different types have different ways of constructing the optimization operator, so that the operators to be optimized need to be divided into two types according to the available construction ways, an operator that is completely unsupported by the second operation main body (i.e., an operator that is different from the operator type supported by the second operation main body in the operators to be optimized, so that the operator that is not supported by the second operation main body in the operators to be optimized) can be used as the first type optimization operator, and an operator that is the same type as the operator supported by the second operation main body in all the operators to be optimized but is not supported by the second operation main body due to the different data processing ways.

For example, in a speech recognition scenario, the operators to be optimized in the former structure include: Self-Attention operator, LayerArNorm operator, Cony1D operator, and Depthwise Conv1D operator. The Self-orientation operator and the LayerArNorm operator are different from the operator types supported by the second operation main body, so that the second operation main body cannot support the operators, and the Self-orientation operator and the LayerArNorm operator are both first type operators to be optimized. The Conv1D operator and the Depthwise Conv1D operator are convolution operators and are the same as the operator type supported by the second operation body, but the second operation body only can support symmetric convolution operation, while the Conv1D operator and the Depthwise Conv1D operator are one-dimensional convolution operators, and the data processing mode is asymmetric convolution operation and is different from that of the convolution operators supported by the second operation body. All the Conv1D operators and the Depthwise Conv1D operators are the second type of operator to be optimized.

S402, by combining operators supported by the second operation main body, operators with the same functions as the first type of operators to be optimized are constructed, and the optimized operators corresponding to the first type of operators to be optimized are obtained.

Specifically, for operators to be optimized, which are different from the operator types supported by the second operating body, so that operators not supported by the second operating body need to construct corresponding optimization operators in a combined manner, that is, the optimization operators corresponding to the first type of operators to be optimized need to be constructed in a combined manner. Firstly, a plurality of operators which can be combined to realize the same function of the first type of operator to be optimized are selected from operators supported by the second operation main body, and then the selected operators are combined into the optimization operator according to the corresponding operator combination mode, so that the optimization operator can realize the same function as the corresponding first type of operator to be optimized.

When a plurality of operators which can be combined to realize the same function of the first type of operator to be optimized are selected from the operators supported by the second operation main body, the first operator which can realize the function of the first type of operator to be optimized needs to be selected, and the corresponding adaptation operator which carries out corresponding data adjustment on the first operator needs to be selected, so that the input and output types of the first operator subjected to adaptation adjustment by the adaptation operator are the same as the input and output types of the first type of operator to be optimized.

And S403, optimizing an operator which is supported by the second operation main body and has the same type as the operator to be optimized of the second type to obtain an optimization operator corresponding to the operator to be optimized of the second type.

Specifically, for the operators of the same type as the operators supported by the second operation subject but different data processing manners, an optimization manner needs to be adopted to construct the corresponding optimization operator, that is, an optimization manner needs to be adopted to construct the optimization operator corresponding to the operator to be optimized of the second type. Firstly, a second operator of the same type as the operator to be optimized of the second type is selected from operators supported by the second operation main body, the second operator and the operator to be optimized of the second type are of the same type but have different data processing modes, and therefore the second operator needs to be optimized after the second operator is selected, so that the data type output by the optimized second operator is the same as the data type output by the operator to be optimized of the second type.

For example, in a speech recognition scenario, the Conv1D operator and the Depthwise Conv1D operator in the former structure are convolution operators, but the adopted data processing mode is asymmetric convolution operation, so that the second operator selected from the second operation body is also convolution operator, but the convolution operators supported by the second operation body both adopt a data processing mode of symmetric convolution operation, so that the data dimension output by the asymmetric convolution operation of the Conv1D and the Depthwise Conv1D operator is different from the data dimension output by the symmetric convolution operation of the second operator, and therefore, the optimization operation needs to be performed on the convolution operator selected from the second operation body, so that the data dimension output by the symmetric convolution operation of the second operator is the same as the data dimension output by the asymmetric convolution operation of the Conv1D and the data dimension output by the Depthwise Conv 1D.

In this embodiment, the execution order of step S402 and step S403 is not limited, and step S402 may be executed first and then step S403 may be executed, step S403 may be executed first and then step S402 may be executed, or step S402 and step S403 may be executed simultaneously.

In the following, taking the example of performing structural optimization on the speech recognition system based on the former so that the speech recognition system can be deployed to the limited end side, a specific operator optimization process is introduced.

As a preferred implementation manner, another embodiment of the present application discloses that the operator to be optimized of the first type is an attention mechanism operator or a layer normalization operator. For example, in a speech recognition scenario based on former, the above Self-Attention operator is an Attention mechanism operator, and the LayearNorm operator is a layer normalization operator.

The above-mentioned combining the operators supported by the second operation subject to construct an operator having the same function as the first type of operator to be optimized to obtain an optimization operator corresponding to the first type of operator to be optimized includes:

if the first type of operator to be optimized is an attention mechanism operator, firstly, a neural network, a batch normalization operator and an activation function which are matched with the functions of the attention mechanism operator are utilized to construct and obtain a semi-optimization operator corresponding to the attention mechanism operator.

Specifically, if the first type of operator to be optimized is an attention machine operator, a neural network matched with the function of the attention machine operator may be selected from operators supported by the second operation subject, so that the function realized by the attention machine operator may be realized. The neural network, the batch normalization operator and the activation function are combined correspondingly according to the functions and the characteristics of the operators, so that a semi-optimization operator corresponding to the attention mechanism operator is constructed.

For example, in a speech recognition scenario, the neural network may employ a Feedforward Sequence Memory Network (FSMN) that, like the Self-Attention operator, is capable of learning contextual information of speech, and the activation function may employ a Relu activation function. And constructing a semi-optimization operator corresponding to the Self-Attention operator by utilizing a feedforward sequence memory network, a batch normalization operator and a Relu activation function. FIG. 5 is a multi-head sequence memory mechanism module, which is composed of a Qurey linear layer operator in the multi-head attention mechanism module, a feedforward sequence memory network in a semi-optimization operator, a batch normalization operator, and a Relu activation function. The combination of the feedforward sequence memory network, the batch normalization operator and the Relu activation function in the semi-optimization operator is shown in FIG. 5. Where Norm represents the batch normalization layer operator, Relu represents the Relu activation function, FSMN_KAnd FSMN_VAre all feedforward sequence memory networks, FSMN_KK and FSMN obtained through Key linear layer in module for determining multi-head attention mechanism_VAnd the method is used for determining the multiple attention mechanism module to obtain V through a Value linear layer.

In addition, the implementation of the feedforward sequence memory network is shown as follows:

according to the formula, the feedforward sequence memory network can realize the function by using a convolution network, but the convolution network adopts asymmetric convolution operation, because the second operation main body only supports the symmetric convolution operation and does not support the asymmetric convolution operation, if the feedforward sequence memory network is deployed to the second operation main body, the feedforward sequence memory network is realized by adopting the symmetric convolution operation, so that the dimension of the output data is changed, therefore, it is necessary to add a data dimension processing module before the input data is input into the feedforward sequence memory network, and perform zero padding operation on the input data by using the data dimension processing module, the data dimension of the feedforward sequence memory network output by carrying out the symmetric convolution operation on the input data with zero padding is the same as the data dimension of the feedforward sequence memory network output by carrying out the asymmetric convolution operation on the input data without zero padding.

The batch normalization operator can be used for normalizing the output of the feedforward sequence memory network, accelerating the convergence of the model and then obtaining the nonlinear representation through the Relu activation function.

Secondly, integrating the batch normalization operators in the semi-optimization operators into convolution operators to obtain optimization operators corresponding to the attention mechanism operators.

Specifically, a batch normalization operator is adopted in the semi-optimization operator corresponding to the constructed attention machine operator, but the second operation main body does not support the batch normalization operator, so that the batch normalization operator needs to be integrated into a convolution operator supported by the second operation main body, the batch normalization operator is realized in an operation mode of the convolution operator to obtain support of the second operation main body, and an operator obtained after the batch normalization operator in the semi-optimization operator is integrated into the convolution operator is used as the optimization operator corresponding to the attention machine operator.

The implementation mode of integrating the batch normalization operator into the convolution operator is as follows:

(1) the operation of the convolution operator can be simply written as:

y_conv＝w*x+b

(2) the operation of the batch normalization operator can be written as:

y_bn(x_i)＝γx′_i+β

(3) and (3) fusion of convolution operators and batch normalization operators:

suppose that:

then:

y_conv＝w′*x+b′

thus, the batch normalization operator is integrated into the convolution operator, and the function of the batch normalization operator can be realized at the same time.

Further, the above-mentioned combining the operators supported by the second operation subject to construct an operator having the same function as the first type of operator to be optimized to obtain an optimization operator corresponding to the first type of operator to be optimized further includes:

and if the first type operator to be optimized is a layer normalization operator, integrating the batch normalization operator matched with the layer normalization operator into a convolution operator to obtain an optimization operator with the same function as the layer normalization operator.

Specifically, if the first type of operator to be optimized is a layer normalization operator, a batch normalization operator having the same normalization function as the layer normalization operator can be obtained, and since the second operation body does not support the batch normalization operator, the batch normalization operator needs to be blended into the convolution operator, so that the optimization operator corresponding to the layer normalization operator is obtained. Therefore, the optimization operator has the same structure as the convolution operator and can be supported by the second operation main body, and the function of the layer normalization operator can be realized at the same time. The implementation of batch normalization to merge into convolution operators is described in detail in the above embodiments, and is not described here any more.

As a preferred implementation manner, another embodiment of the present application discloses that the operator to be optimized of the second type is an asymmetric convolution operator. For example, in a context of Conformer-based speech recognition, the Cony1D operator and Depthwise Conv1D operator are both asymmetric convolution operators.

and adding a data dimension processing module to the symmetric convolution operator supported by the second operation main body to obtain an optimization operator corresponding to the asymmetric convolution operator.

Specifically, because the second operation body only supports symmetric convolution operation and does not support asymmetric convolution operation, for the asymmetric convolution operator, an operator that is the same as the asymmetric convolution operator in type and supported by the second operation body, that is, the symmetric convolution operator, needs to be selected. However, the data dimensionality of the output data obtained by performing the symmetric convolution operation on the input data by the symmetric convolution operator is different from the data dimensionality of the output data obtained by performing the asymmetric convolution operation on the input data by the asymmetric convolution operator, so that a data dimensionality processing module can be added in front of the symmetric convolution operator, the data dimensionality of the input data is adjusted by the data dimensionality processing module, for example, zero padding operation is performed to enable the input data with the adjusted data dimensionality to meet the requirements of the symmetric convolution operator, then the input data with the adjusted data dimensionality is input into the symmetric convolution operator to perform the symmetric convolution operation, and the obtained data dimensionality of the output data can be the same as the data dimensionality of the output data obtained by performing the asymmetric convolution operation on the input data with the unadjusted data dimensionality by the asymmetric convolution operator. Therefore, after the data dimension processing module is added to the symmetric convolution operator with the same type as the asymmetric convolution operator, the symmetric convolution operator can be used as the optimization operator corresponding to the asymmetric convolution operator, so that the optimization operator is supported by the second operation main body.

As a preferred implementation manner, another embodiment of the present application discloses that the data processing model building method further includes:

training the second data processing model.

Specifically, the second data processing model is obtained by the first data processing model through operator replacement, and the operator replacement may reduce the performance of the second data processing model to a certain extent, so that the data processing accuracy is affected. The second data processing model can be trained by utilizing pre-collected model training data, the model training data is input into the second data processing model, a loss function of the second data processing model is determined through the processing of the model training data by the second data processing model, and parameters in the second data processing model are adjusted according to the loss function until the loss function is within a preset threshold range, so that the second data processing model is trained completely, and the model has higher data processing accuracy.

As a preferred implementation, referring to fig. 6, another embodiment of the present application discloses that the training of the second data processing model includes:

s601, obtaining model training data and a data processing reference model corresponding to the second data processing model.

Specifically, since the model training needs to be trained using the training data, the model training data trained on the second data processing model needs to be acquired in advance. The second data processing model is used for processing the input information to be processed so as to obtain a processed result. Therefore, in a supervised training mode, the model training data to be acquired should include the information to be processed and the data processing result corresponding to the information to be processed. For example, in a speech recognition scenario, the speech recognition system is used to recognize the input speech signal into a corresponding text result, so the supervised training data needs to include not only audio, but also the labeled text corresponding to each audio.

In addition, the present embodiment may utilize a multi-tasking joint training criterion in order to minimize losses during model training. In order to perform the multitask joint training, the data processing reference model is required to be obtained, and the performance of the data processing reference model is required to be higher than that of the second data processing model, so that when the data processing reference model and the second data processing model perform the multitask joint training, the second data processing model can be constrained through the data processing reference model, and the training efficiency of the second data processing model is improved.

For example, in a speech recognition scenario, the recognition accuracy of the speech recognition reference model needs to be higher than the speech recognition system to be trained. The speech recognition benchmark model can adopt a speech recognition system of a former structure adopted by the current cloud, and the speech recognition system of the former structure is already applied in the cloud and has a good speech recognition effect, so that the speech recognition benchmark model can be used as the speech recognition benchmark model.

And S602, performing multi-task joint training on the second data processing model and the data processing reference model by using the model training data.

Specifically, the multitask joint training of the second data processing model and the data processing reference model is to input model training data into the second data processing model and the data processing reference model at the same time, the data processing result of the data processing reference model constrains the data processing result of the second data processing model, loss functions of the two models are determined, parameter adjustment of the second data processing model is performed according to the loss functions of the two models, performance improvement of the second data processing model is achieved, and the multitask joint training can improve training efficiency of the second data processing model and robustness of the second data processing model.

When model training is carried out, the model training data can be used for training the data processing reference model, when the data processing reference model is better trained, the second data processing model is added into training, at the moment, the model training data is used for carrying out multi-task combined training on the second data processing model and the data processing reference model, the performance of the data processing reference model is higher than that of the second data processing model, the second data processing model can be constrained, and the common progress between the two models is realized. The data processing reference model can be not trained in advance, the two models can be subjected to multi-task combined training by directly utilizing model training data, and the performance of the two models can be improved simultaneously by continuously adjusting parameters of the two models through training, so that two high-performance models can be obtained. If the data processing reference model is a model with high performance, the model training data can be used for carrying out multi-task combined training on the two models, but the data processing reference model only restricts the second data processing model so that the second data processing model continuously adjusts the parameters of the second data processing model in the training process to realize the performance improvement of the second data processing model, and the data processing reference model does not adjust the parameters of the second data processing model in the training process.

As a preferred implementation manner, referring to fig. 7, another embodiment of the present application discloses that the model training data is speech recognition training data. The performing, by using the model training data, the multitask joint training on the second data processing model and the data processing reference model includes:

and S701, inputting the voice recognition training data into the data processing reference model and the second data processing model respectively.

Specifically, as shown in the frame schematic diagram of the multitask joint training shown in fig. 8, when the multitask connection training is performed on the data processing reference model and the second data processing model, two independent encoders may be respectively used to share one decoder, where encor (san) is the encoder used by the second data processing model, encor (fan) is the encoder used by the data processing reference model, and decor (share) is the decoder shared by the data processing reference model and the second data processing model.

In a speech recognition scene, speech recognition training data are respectively input into a data processing reference model and a second data processing model, as the accuracy of non-streaming recognition speech recognition is higher, streaming models and non-streaming models of speech recognition are trained uniformly, the data processing reference model is used as the non-streaming model to perform non-streaming recognition on the speech recognition training data, and the second data processing model is used as the streaming model to perform streaming recognition on the speech recognition data. Because streaming speech recognition systems are far from performing as compared to non-streaming speech recognition systems. In order to realize the streaming recognition of the second data processing model, the range of the visual field needs to be strictly limited, and the future information of the voice cannot be seen completely. Therefore, the performance of non-streaming speech recognition is improved by using the combined training of streaming and non-streaming, and meanwhile, the performance of the streaming speech recognition system can also be improved by restricting the recognition of the streaming model through the recognition result with higher accuracy of the non-streaming model, so that the recognition accuracy of the second data processing model can be improved.

S702, calculating a first loss function corresponding to the second data processing model and a second loss function corresponding to the data processing reference model.

Specifically, after the speech recognition training data is subjected to speech recognition by the data processing reference model and the second data processing model, a loss function of each model needs to be calculated, where a loss function corresponding to the second data processing model is a first loss function, and a loss function corresponding to the data processing reference model is a second loss function. Los in FIG. 8_SANIs a first loss function, loss_FANIs a second loss function.

S703, adjusting parameters of the second data processing model by using the first loss function and the second loss function until the first loss function and the second loss function are within a preset loss range.

Specifically, in this embodiment, the model may be trained by using joint gradient back propagation of two loss functions, that is, parameters of the second data processing model are adjusted according to the sum of the first loss function and the second loss function, so as to improve the performance of the second data processing model. When the first loss function and the second loss function are both in the preset loss range, it is indicated that the error of the two models is in the acceptable range at this time, the model training can be stopped, and the second data processing model which finally completes parameter adjustment is taken as a model which can be deployed on a second operation main body. In addition, when the model training is performed, the parameters of the second data processing model may be adjusted according to the sum of the first loss function and the second loss function, and the parameters of the data processing reference model may be adjusted according to the sum of the first loss function and the second loss function, so that the performance of the data processing reference model is also improved, and thus a data processing reference model with higher performance may be obtained.

As a preferred implementation manner, another embodiment of the present application discloses that, before performing the multitask joint training on the second data processing model and the data processing reference model by using the model training data, the method further includes:

Specifically, in a speech recognition scenario, different speakers have different sound characteristics such as timbre and tone, and the difference in the sound characteristics of each speaker will affect the accuracy of speech recognition, so that in order to reduce the influence of the sound characteristics on speech recognition, it is necessary to mark a tag of the speaker on each audio, that is, mark a corresponding identifier of each speaker, when obtaining speech recognition training data. Then, before inputting the speech recognition training data into the second data processing model and the data processing reference model, the speaker information corresponding to each speech recognition training data needs to be determined according to the speaker identifier labeled in advance in each speech recognition training data, and then normalization processing is performed on the speaker information in each speech recognition training data. During model training, speaker information normalization processing is performed, and then when the trained second data processing model is deployed on a second operation subject for application, speaker information normalization processing is performed on the speech to be recognized and then the speech to be recognized is recognized, so that the accuracy of speech recognition is improved.

Further, in a speech recognition scenario, input acoustic features of a model need to be selected, and in speech recognition, PLP, MFCC, FBANK, etc. are commonly used. The FBANK reserves a plurality of acoustic characteristics and can well meet the modeling mode of end-to-end big data. FBANK features are therefore typically used as input acoustic features for models in speech recognition systems of the end-to-end framework.

Corresponding to the above data processing model construction method, an embodiment of the present application further provides a data processing model construction apparatus, as shown in fig. 9, the apparatus including:

an operator determining unit 100, configured to determine an operator to be optimized from a first data processing model running on a first running body; the operator to be optimized is an operator which is not supported by the second operation main body;

the operator constructing unit 110 is configured to construct an optimization operator corresponding to the operator to be optimized by using an operator supported by the second operation subject, where the optimization operator can implement the same function as the corresponding operator to be optimized;

and an operator replacing unit 120, configured to replace, by using the optimization operator, an operator to be optimized in the first data processing model, to obtain a second data processing model.

In the data processing model constructing apparatus provided in the embodiment of the present application, the operator determining unit 100 determines, from the first data processing model running on the first running body, an operator that is not supported by the second running body, as an operator to be optimized, the operator constructing unit 110 constructs, using the operator supported by the second running body, an optimization operator corresponding to the operator to be optimized, so that the optimization operator can realize the same function as the corresponding operator to be optimized and can be supported by the second running body, the operator replacing unit 120 replaces, using the optimization operator, the operator to be optimized in the first data processing model, to obtain the second data processing model, so that the second data processing model and the first data processing model have the same function and frame and can be deployed to the second running body, and the data processing model that realizes the same function and frame can be deployed to the running bodies with different computing resources, the practicability of the data processing model is improved, and the development pressure on the data processing model is relieved.

As an optional implementation manner, another embodiment of the present application further discloses that, when the operator determining unit 100 determines an operator to be optimized from a first data processing model running on a first running agent, the operator determining unit is specifically configured to:

As an optional implementation manner, another embodiment of the present application further discloses that the operator constructing unit 110 is specifically configured to:

As an optional implementation manner, another embodiment of the present application further discloses that the operator constructing unit 110 includes:

the operator type determining unit is used for determining a first type of operator to be optimized and a second type of operator to be optimized from each operator to be optimized; the first type of operator to be optimized is an operator which is not supported by the second operation main body, and the second type of operator to be optimized is an operator which has a different data processing mode from the operator of the same type supported by the second operation main body;

the operator combination unit is used for combining the operators supported by the second operation main body, constructing operators with the same functions as the first type of operators to be optimized, and obtaining the optimized operators corresponding to the first type of operators to be optimized;

and the operator optimization unit is used for optimizing an operator which is supported by the second operation main body and has the same type as the operator to be optimized of the second type to obtain an optimization operator corresponding to the operator to be optimized of the second type.

As an optional implementation manner, another embodiment of the present application further discloses that the first type of operator to be optimized is an attention mechanism operator or a layer normalization operator;

the operator combination unit is configured to construct an operator having the same function as the first type of operator to be optimized by combining operators supported by the second operation subject, and when obtaining an optimization operator corresponding to the first type of operator to be optimized, the operator combination unit is specifically configured to:

integrating the batch normalization operator in the semi-optimization operator into a convolution operator to obtain an optimization operator corresponding to the attention mechanism operator;

or if the first type operator to be optimized is a layer normalization operator, integrating batch normalization operators matched with the layer normalization operator into a convolution operator to obtain an optimization operator with the same function as the layer normalization operator;

wherein the neural network, the convolution operator, and the activation function are all operators supported by the second operation body.

As an optional implementation manner, another embodiment of the present application further discloses that the second type of operator to be optimized is an asymmetric convolution operator;

the operator optimization unit is specifically configured to, when obtaining an optimization operator corresponding to the second type of operator to be optimized by optimizing an operator of the same type as the second type of operator to be optimized, which is supported by the second operation subject:

As an optional implementation manner, another embodiment of the present application further discloses that the data processing model constructing apparatus further includes: and the model training unit is used for training the second data processing model.

As an optional implementation manner, another embodiment of the present application further discloses that the model training unit includes:

the data acquisition unit is used for acquiring model training data and a data processing reference model corresponding to the second data processing model;

the joint training unit is used for performing multi-task joint training on the second data processing model and the data processing reference model by using the model training data;

As an optional implementation manner, another embodiment of the present application further discloses that, when the joint training unit performs multi-task joint training on the second data processing model and the data processing reference model by using the model training data, the joint training unit is specifically configured to:

and adjusting parameters of the second data processing model by using the first loss function and the second loss function until the first loss function and the second loss function are within a preset loss range.

As an optional implementation manner, another embodiment of the present application further discloses that the data processing model building apparatus further includes:

and the normalization unit is used for carrying out speaker information normalization processing on the voice recognition training data.

Specifically, please refer to the description of the above method embodiment for the specific working content of each unit of the data processing model building apparatus, which is not repeated here.

Another embodiment of the present application further discloses a data processing model building apparatus, as shown in fig. 10, the apparatus includes:

a memory 200 and a processor 210;

wherein, the memory 200 is connected to the processor 210 for storing programs;

the processor 210 is configured to implement the data processing model building method disclosed in any of the above embodiments by running the program stored in the memory 200.

Specifically, the data processing model building device may further include: a bus, a communication interface 220, an input device 230, and an output device 240.

The processor 210, the memory 200, the communication interface 220, the input device 230, and the output device 240 are connected to each other through a bus. Wherein:

a bus may include a path that transfers information between components of a computer system.

The processor 210 may be a general-purpose processor, such as a general-purpose Central Processing Unit (CPU), microprocessor, etc., an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of programs in accordance with the inventive arrangements. But may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components.

The processor 210 may include a main processor and may also include a baseband chip, modem, and the like.

The memory 200 stores programs for executing the technical solution of the present invention, and may also store an operating system and other key services. In particular, the program may include program code comprising computer operating instructions. More specifically, memory 200 may include a read-only memory (ROM), other types of static storage devices that may store static information and instructions, a Random Access Memory (RAM), other types of dynamic storage devices that may store information and instructions, a disk storage, a flash, and so forth.

The input device 230 may include a means for receiving data and information input by a user, such as a keyboard, mouse, camera, scanner, light pen, voice input device, touch screen, pedometer or gravity sensor, etc.

Output device 240 may include equipment that allows output of information to a user, such as a display screen, printer, speakers, etc.

Communication interface 220 may include any means for using a transceiver or the like to communicate with other devices or communication networks, such as ethernet, a Radio Access Network (RAN), a Wireless Local Area Network (WLAN), etc.

The processor 2102 executes the programs stored in the memory 200 and invokes other devices, which may be used to implement the steps of the data processing model building method provided by the embodiments of the present application.

Another embodiment of the present application further provides a storage medium, where a computer program is stored on the storage medium, and when the computer program is executed by a processor, the computer program implements the steps of the data processing model building method provided in any one of the embodiments.

While, for purposes of simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present application is not limited by the order of acts or acts described, as some steps may occur in other orders or concurrently with other steps in accordance with the application. Further, those skilled in the art will recognize that the embodiments described in this specification are preferred embodiments and that acts or modules referred to are not necessarily required for this application.

It should be noted that, in this specification, each embodiment is described in a progressive manner, and each embodiment focuses on differences from other embodiments, and portions that are the same as and similar to each other in each embodiment may be referred to. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and reference may be made to the partial description of the method embodiment for relevant points.

The steps in the method of the embodiments of the present application may be sequentially adjusted, combined, and deleted according to actual needs.

The modules and sub-modules in the device and the terminal in the embodiments of the application can be combined, divided and deleted according to actual needs.

In the several embodiments provided in the present application, it should be understood that the disclosed terminal, apparatus and method may be implemented in other manners. For example, the above-described terminal embodiments are merely illustrative, and for example, the division of a module or a sub-module is only one logical function division, and other division manners may be available in actual implementation, for example, a plurality of sub-modules or modules may be combined or integrated into another module, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.

The modules or sub-modules described as separate parts may or may not be physically separate, and parts that are modules or sub-modules may or may not be physical modules or sub-modules, may be located in one place, or may be distributed over a plurality of network modules or sub-modules. Some or all of the modules or sub-modules can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional module or sub-module in the embodiments of the present application may be integrated into one processing module, or each module or sub-module may exist alone physically, or two or more modules or sub-modules may be integrated into one module. The integrated modules or sub-modules can be implemented in the form of hardware, and can also be implemented in the form of software functional modules or sub-modules.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the components and steps of the various examples have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software unit executed by a processor, or in a combination of the two. The software cells may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The term "comprising", without further limitation, means that the element so defined is not excluded from the group consisting of additional identical elements in the process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for constructing a data processing model, comprising:

2. The method of claim 1, wherein determining an operator to be optimized from a first data processing model running on a first running agent comprises:

3. The method of claim 1, wherein constructing the optimization operator corresponding to the operator to be optimized by using the operator supported by the second operation subject comprises:

4. The method according to claim 3, wherein the constructing an optimization operator corresponding to the operator to be optimized by combining or optimizing the operators supported by the second operation subject comprises:

determining a first type of operator to be optimized and a second type of operator to be optimized from each operator to be optimized; the first type of operator to be optimized is an operator which is not supported by the second operation main body, and the second type of operator to be optimized is an operator which has a different data processing mode from the operator of the same type supported by the second operation main body;

5. The method of claim 4, wherein the first type of operator to be optimized is an attention mechanism operator or a layer normalization operator;

the method for constructing an operator with the same function as the first type of operator to be optimized by combining operators supported by the second operation main body to obtain an optimization operator corresponding to the first type of operator to be optimized includes:

or,

if the first type of operator to be optimized is a layer normalization operator, integrating a batch normalization operator matched with the layer normalization operator into a convolution operator to obtain an optimization operator with the same function as the layer normalization operator;

6. The method according to claim 4, characterized in that the second type of operator to be optimized is an asymmetric convolution operator;

7. The method of claim 1, further comprising:

training the second data processing model.

8. The method of claim 7, wherein training the second data processing model comprises:

obtaining model training data and a data processing reference model corresponding to the second data processing model;

9. The method of claim 8, wherein the model training data is speech recognition training data;

10. The method of claim 9, wherein prior to the multitasking joint training of the second data processing model and the data processing reference model using the model training data, further comprising:

11. A data processing model building apparatus, comprising:

12. A data processing model building apparatus, characterized by comprising:

a memory and a processor;

the processor is configured to implement the data processing model construction method according to any one of claims 1 to 10 by executing the program in the memory.

13. A storage medium, characterized in that the storage medium has stored thereon a computer program which, when executed by a processor, implements a data processing model construction method according to any one of claims 1 to 10.