CN114881203A

CN114881203A - Model reasoning method, device and electronic device

Info

Publication number: CN114881203A
Application number: CN202210389034.9A
Authority: CN
Inventors: 孙静静; 张演龙; 季映羽
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-04-13
Filing date: 2022-04-13
Publication date: 2022-08-09

Abstract

The present disclosure provides a model inference method, device and electronic device, and relates to the technical fields of artificial intelligence such as deep learning and computer vision. The specific implementation scheme is: performing equivalent transformation on the input image data of the target model to obtain an input matrix; The processing priority of the row data of the input matrix relative to the column data; according to the data processing method, the first weight matrix of the target model and the input matrix are multiplied to obtain a model inference result. A weight matrix is obtained by performing sparse processing on the second weight matrix obtained by training the target model based on the preset sparsity.

Description

Model reasoning method, device and electronic device

技术领域technical field

本公开涉及人工智能技术领域，尤其涉及深度学习、计算机视觉技术领域，具体涉及一种模型推理方法、装置及电子设备。The present disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of deep learning and computer vision, and in particular to a model inference method, device and electronic device.

背景技术Background technique

随着人工智能的高速发展，计算机视觉如图像分类识别等领域，对神经网络模型推理的实时性要求越来越高，因此，在移动端实现轻量级、高性能以及低能耗的推理加速显得至关重要。With the rapid development of artificial intelligence, in the fields of computer vision such as image classification and recognition, the real-time requirements of neural network model inference are getting higher and higher. Therefore, it is obvious to realize lightweight, high-performance and low-energy inference acceleration on the mobile terminal. critical.

为了解决神经网络模型对于数据存储量大及计算复杂度高的问题，通常可以采用剪枝方式对神经网络模型的权重矩阵进行稀疏化处理，并基于神经网络模型稀疏化处理后的权重矩阵对输入图像数据进行卷积计算，得到模型推理结果。In order to solve the problem that the neural network model has a large amount of data storage and high computational complexity, the weight matrix of the neural network model can usually be sparsely processed by pruning, and the input weight matrix can be sparsed based on the neural network model. The image data is subjected to convolution calculation to obtain the model inference result.

发明内容SUMMARY OF THE INVENTION

本公开提供了一种模型推理方法、装置及电子设备。The present disclosure provides a model inference method, device and electronic device.

根据本公开的第一方面，提供了一种模型推理方法，包括：According to a first aspect of the present disclosure, there is provided a model inference method, comprising:

对目标模型的输入图片数据进行等价转换，得到输入矩阵；Perform equivalent transformation on the input image data of the target model to obtain the input matrix;

基于所述输入图片数据的数据量，确定所述输入矩阵的数据处理方式，所述数据处理方式用于表征所述输入矩阵的行数据相对于列数据的处理优先级；determining a data processing mode of the input matrix based on the data amount of the input picture data, where the data processing mode is used to represent the processing priority of the row data of the input matrix relative to the column data;

按照所述数据处理方式，对所述目标模型的第一权重矩阵和所述输入矩阵进行相乘处理，得到模型推理结果，所述第一权重矩阵基于预设稀疏度对所述目标模型训练获得的第二权重矩阵进行稀疏化处理得到。According to the data processing method, the first weight matrix of the target model and the input matrix are multiplied to obtain a model inference result, and the first weight matrix is obtained by training the target model based on the preset sparsity. The second weight matrix of is obtained by sparse processing.

根据本公开的第二方面，提供了一种模型推理装置，包括：According to a second aspect of the present disclosure, there is provided a model inference device, comprising:

转换模块，用于对目标模型的输入图片数据进行等价转换，得到输入矩阵；The conversion module is used to perform equivalent conversion on the input image data of the target model to obtain the input matrix;

确定模块，用于基于所述输入图片数据的数据量，确定所述输入矩阵的数据处理方式，所述数据处理方式用于表征所述输入矩阵的行数据相对于列数据的处理优先级；a determining module, configured to determine a data processing mode of the input matrix based on the data amount of the input picture data, where the data processing mode is used to represent the processing priority of the row data of the input matrix relative to the column data;

相乘处理模块，用于按照所述数据处理方式，对所述目标模型的第一权重矩阵和所述输入矩阵进行相乘处理，得到模型推理结果，所述第一权重矩阵基于预设稀疏度对所述目标模型训练获得的第二权重矩阵进行稀疏化处理得到。a multiplication processing module, configured to perform multiplication processing on the first weight matrix of the target model and the input matrix according to the data processing method to obtain a model inference result, where the first weight matrix is based on a preset sparsity The second weight matrix obtained by training the target model is obtained by sparse processing.

根据本公开的第三方面，提供了一种电子设备，包括：According to a third aspect of the present disclosure, there is provided an electronic device, comprising:

至少一个处理器；以及at least one processor; and

与至少一个处理器通信连接的存储器；其中，a memory communicatively coupled to the at least one processor; wherein,

存储器存储有可被至少一个处理器执行的指令，该指令被至少一个处理器执行，以使至少一个处理器能够执行第一方面中的任一项方法。The memory stores instructions executable by the at least one processor to enable the at least one processor to perform any of the methods of the first aspect.

根据本公开的第四方面，提供了一种存储有计算机指令的非瞬时计算机可读存储介质，该计算机指令用于使计算机执行第一方面中的任一项方法。According to a fourth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform any one of the methods of the first aspect.

根据本公开的第五方面，提供了一种计算机程序产品，包括计算机程序，该计算机程序在被处理器执行时实现第一方面中的任一项方法。According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements any one of the methods of the first aspect.

根据本公开的技术解决了模型推理效率比较低的问题，提高了模型推理的效率。The technology according to the present disclosure solves the problem of relatively low model inference efficiency, and improves the efficiency of model inference.

应当理解，本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征，也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。It should be understood that what is described in this section is not intended to identify key or critical features of embodiments of the disclosure, nor is it intended to limit the scope of the disclosure. Other features of the present disclosure will become readily understood from the following description.

附图说明Description of drawings

附图用于更好地理解本方案，不构成对本公开的限定。其中：The accompanying drawings are used for better understanding of the present solution, and do not constitute a limitation to the present disclosure. in:

图1是根据本公开第一实施例的模型推理方法的流程示意图；1 is a schematic flowchart of a model inference method according to a first embodiment of the present disclosure;

图2是第一权重矩阵和输入矩阵的计算原理示意图；Fig. 2 is a schematic diagram of the calculation principle of a first weight matrix and an input matrix;

图3是对输入矩阵进行先列后行的处理原理示意图；Fig. 3 is a schematic diagram of the processing principle of performing column first and then row on the input matrix;

图4是对输入矩阵进行先行后列的处理原理示意图；Fig. 4 is a schematic diagram of the processing principle of performing first row and then column on the input matrix;

图5是第一权重矩阵的数据存储格式示意图；Fig. 5 is the data storage format schematic diagram of the first weight matrix;

图6是根据本公开第二实施例的模型推理装置的结构示意图；6 is a schematic structural diagram of a model inference apparatus according to a second embodiment of the present disclosure;

图7是用来实施本公开的实施例的示例电子设备的示意性框图。7 is a schematic block diagram of an example electronic device used to implement embodiments of the present disclosure.

具体实施方式Detailed ways

以下结合附图对本公开的示范性实施例做出说明，其中包括本公开实施例的各种细节以助于理解，应当将它们认为仅仅是示范性的。因此，本领域普通技术人员应当认识到，可以对这里描述的实施例做出各种改变和修改，而不会背离本公开的范围和精神。同样，为了清楚和简明，以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding and should be considered as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted from the following description for clarity and conciseness.

第一实施例first embodiment

如图1所示，本公开提供一种模型推理方法，包括如下步骤：As shown in FIG. 1 , the present disclosure provides a model inference method, including the following steps:

步骤S101：对目标模型的输入图片数据进行等价转换，得到输入矩阵。Step S101: Perform equivalent transformation on the input image data of the target model to obtain an input matrix.

本实施例中，模型推理方法涉及人工智能技术领域，尤其涉及深度学习、计算机视觉技术领域，其可以广泛应用于图像识别、分割等图像处理场景下。本公开实施例的模型推理方法，可以由本公开实施例的模型推理装置执行。本公开实施例的模型推理装置可以配置在任意电子设备中，以执行本公开实施例的模型推理方法。该电子设备可以为服务器，也可以为终端设备，这里不进行具体限定。In this embodiment, the model inference method relates to the technical field of artificial intelligence, in particular to the technical field of deep learning and computer vision, and can be widely used in image processing scenarios such as image recognition and segmentation. The model inference method of the embodiment of the present disclosure may be executed by the model inference apparatus of the embodiment of the present disclosure. The model inference apparatus of the embodiment of the present disclosure may be configured in any electronic device to execute the model inference method of the embodiment of the present disclosure. The electronic device may be a server or a terminal device, which is not specifically limited here.

目标模型可以为神经网络模型，其可以包括至少一个卷积核，每个卷积核对应至少一个通道，通过训练每个卷积核在卷积计算时的权值，可以构建目标模型的权重矩阵。当图像输入至目标模型时，该目标模型可以将权重矩阵与输入图像数据进行卷积计算，以进行模型推理，得到模型推理结果，从而实现图像处理。The target model can be a neural network model, which can include at least one convolution kernel, each convolution kernel corresponds to at least one channel, and the weight matrix of the target model can be constructed by training the weights of each convolution kernel during convolution calculation . When the image is input to the target model, the target model can perform the convolution calculation on the weight matrix and the input image data to perform model inference, and obtain the model inference result, thereby realizing image processing.

输入图片数据可以为输入图片的像素数据，输入图片数据可以为一张图片的像素数据，也可以为两张图片的像素数据，甚至是多张图片的像素数据，其中，目标模型的输入图片可以为灰度图片，也可以为彩色图片，这里不进行具体限定。The input picture data can be the pixel data of the input picture, the input picture data can be the pixel data of one picture, or the pixel data of two pictures, or even the pixel data of multiple pictures, among which, the input picture of the target model can be It is a grayscale image or a color image, which is not specifically limited here.

在目标模型的输入图片为彩色图片，或者目标模型的输入图片的数量为至少两张的情况下，输入图片数据为三维数据，如针对一彩色图片，其像素数据可以为C1×H×W的三维数据，C1为该彩色图片的通道数，H为该彩色图片的高，W为该彩色图片的宽。When the input picture of the target model is a color picture, or the number of input pictures of the target model is at least two, the input picture data is three-dimensional data. For example, for a color picture, its pixel data can be C1×H×W. Three-dimensional data, C1 is the channel number of the color picture, H is the height of the color picture, and W is the width of the color picture.

当目标模型的输入图片包括多张、且每张输入图片的尺寸均相同的情况下，输入图片数据可以为C2×H×W的三维数据，C2可以为这多张输入图片的通道数之和。When the input image of the target model includes multiple images and the size of each input image is the same, the input image data can be C2×H×W three-dimensional data, and C2 can be the sum of the number of channels of the multiple input images .

可以按照im2col方式或者其他方式对目标模型的输入图片数据进行数据布局的等价转换，以im2col为例，im2col是利用行列式对三维数据进行等价转换，以优化卷积运算，如将三维图片数据C2×H×W变换成C2×C3的二维矩阵。The data layout of the input image data of the target model can be equivalently converted in the im2col method or in other ways. Taking im2col as an example, im2col uses the determinant to perform equivalent conversion on the three-dimensional data to optimize the convolution operation, such as converting the three-dimensional image The data C2×H×W is transformed into a two-dimensional matrix of C2×C3.

在一可选实施方式中，可以将输入图片数据中每个通道的数据扩展成一行数据，多个通道的数据按行进行排列，可以排成一个二维矩阵，得到输入矩阵。其中，输入矩阵的行数为输入图片数据的通道总数，输入矩阵的列数为H×W。In an optional implementation manner, the data of each channel in the input picture data can be expanded into a row of data, and the data of multiple channels can be arranged in rows, which can be arranged in a two-dimensional matrix to obtain the input matrix. Among them, the number of rows of the input matrix is the total number of channels of the input image data, and the number of columns of the input matrix is H×W.

该步骤中，通过将目标模型的输入图片数据等价转换为矩阵，如此可以优化卷积运算，将其转换为矩阵相乘运算，从而可以简化模型的推理运算。In this step, by equivalently converting the input image data of the target model into a matrix, the convolution operation can be optimized and converted into a matrix multiplication operation, thereby simplifying the inference operation of the model.

步骤S102：基于所述输入图片数据的数据量，确定所述输入矩阵的数据处理方式，所述数据处理方式用于表征所述输入矩阵的行数据相对于列数据的处理优先级。Step S102: Determine a data processing mode of the input matrix based on the data amount of the input picture data, where the data processing mode is used to represent the processing priority of the row data of the input matrix relative to the column data.

该步骤中，输入图片数据的数据量可以为数据图片数据的像素数量，比如，针对C2×H×W的输入图片数据，其数据量等于C2*H*W。In this step, the data amount of the input picture data may be the number of pixels of the data picture data, for example, for the input picture data of C2×H×W, the data amount is equal to C2*H*W.

输入矩阵的数据处理方式可以用于表征输入矩阵的行数据相对于列数据的处理优先级，其可以包括两种情况，其一为行数据的处理优先级大于列数据的处理优先级，其二为列数据的处理优先级大于行数据的处理优先级。The data processing method of the input matrix can be used to characterize the processing priority of the row data of the input matrix relative to the column data, which can include two cases, one is that the processing priority of the row data is higher than the processing priority of the column data, the other is The processing priority for column data is higher than the processing priority for row data.

行数据的处理优先级大于列数据的处理优先级表明先处理完所有行数据再处理列数据，而列数据的处理优先级大于行数据的处理优先级表明先处理完所有列数据再处理行数据。The processing priority of row data is higher than the processing priority of column data, indicating that all row data is processed first, and then the column data is processed, while the processing priority of column data is higher than that of row data, indicating that all column data is processed first, and then row data is processed. .

可以基于输入图片数据的数据量自适应进行输入矩阵的行列访问，以确定对输入矩阵的计算方式。The row and column access of the input matrix can be adaptively performed based on the data amount of the input picture data, so as to determine the calculation method of the input matrix.

该步骤中，基于输入图片数据的数据量自适应进行输入矩阵的行列访问，在数据量较少时，由于行数据可能较少，可以通过数据预取的方式很快地将输入矩阵的一行数据取出进行模型推理计算，因此对输入矩阵先行后列的计算方式可以实现更快的模型推理计算。而在数据量较多时，由于行数据可能较多，因此通过对输入矩阵先列后行的计算方式可以实现更快的模型推理计算。In this step, the row and column access of the input matrix is adaptively performed based on the data amount of the input image data. When the amount of data is small, since the row data may be less, the data of one row of the input matrix can be quickly retrieved by means of data prefetching. Take out for model inference calculation, so the calculation method of input matrix first row and then column can realize faster model inference calculation. However, when there is a large amount of data, since there may be more row data, faster model inference calculation can be achieved by calculating the input matrix in columns first and then rows.

步骤S103：按照所述数据处理方式，对所述目标模型的第一权重矩阵和所述输入矩阵进行相乘处理，得到模型推理结果，所述第一权重矩阵基于预设稀疏度对所述目标模型训练获得的第二权重矩阵进行稀疏化处理得到。Step S103: According to the data processing method, multiply the first weight matrix of the target model and the input matrix to obtain a model inference result, and the first weight matrix is based on the preset sparsity. The second weight matrix obtained by model training is obtained by sparse processing.

该步骤中，目标模型的第一权重矩阵可以为稀疏化处理后的权重矩阵，可以采用迁移学习、自动化深度学习平台AutoDL、超参调优等技术训练目标模型，以在业务数据集上训练得到目标模型的权重值，得到第二权重矩阵。In this step, the first weight matrix of the target model may be the weight matrix after sparse processing, and the target model may be trained by techniques such as migration learning, automated deep learning platform AutoDL, hyperparameter tuning, etc., to obtain the result obtained by training on the business data set The weight value of the target model is obtained, and the second weight matrix is obtained.

其中，第二权重矩阵的行数为目标模型的通道总数，列数为卷积核的数量，目标模型的通道总数通常等于输入图片数据的通道总数。Among them, the number of rows of the second weight matrix is the total number of channels of the target model, the number of columns is the number of convolution kernels, and the total number of channels of the target model is usually equal to the total number of channels of the input image data.

可以基于预设稀疏度对第二权重矩阵进行稀疏化处理，稀疏化处理指的是提高权重矩阵中零值所占的比例，即稀疏化处理是将权重矩阵中的非零值变成零，以减少目标模型的权重矩阵的有效权重数量，对目标模型进行压缩。其中，预设稀疏度可以根据实际情况进行设置，通常为了尽可能提高模型推理的速度，提高模型推理的实时性，其预设稀疏度可以大于50％。The second weight matrix can be sparsed based on the preset sparsity. The sparse processing refers to increasing the proportion of zero values in the weight matrix, that is, the sparse processing is to change the non-zero values in the weight matrix into zeros. The target model is compressed to reduce the effective number of weights in the weight matrix of the target model. The preset sparsity may be set according to the actual situation. Generally, in order to improve the speed of model inference as much as possible and improve the real-time performance of model inference, the preset sparsity may be greater than 50%.

可以采用非结构化剪枝技术对第二权重矩阵进行稀疏化处理，也可以采用结构化剪枝技术对第二权重矩阵进行稀疏化处理，这里不进行具体限定。An unstructured pruning technique may be used to sparse the second weight matrix, or a structured pruning technique may be used to sparse the second weight matrix, which is not specifically limited here.

其中，非结构化剪枝技术可以指的是随机剪枝卷积核中的卷积因子即权重值，即非结构化剪枝允许在权重矩阵的任意位置进行裁剪。在一可选实施方式中，可以按照第二权重矩阵中每个有效权重的重要性因子，对第二权重矩阵进行稀疏化处理，如将有效权重排列在后的相对不重要的有效权重置为零。结构化剪枝技术是针对通道和卷积核进行卷积，如将权重矩阵中一行的整个权重组(对应整个卷积核)置为零。Among them, the unstructured pruning technique can refer to the convolution factor in the random pruning convolution kernel, that is, the weight value, that is, unstructured pruning allows clipping at any position of the weight matrix. In an optional embodiment, the second weight matrix may be sparsed according to the importance factor of each effective weight in the second weight matrix, for example, the relatively unimportant effective weights that are arranged after the effective weights are reset. zero. The structured pruning technique is to convolve the channel and the convolution kernel, such as setting the entire weight group (corresponding to the entire convolution kernel) of a row in the weight matrix to zero.

在得到第一权重矩阵的基础上，对第一权重矩阵进行存储，之后在模型推理过程中，可以基于输入图片数据的数据量自适应确定的数据处理方式，对目标模型的第一权重矩阵和输入矩阵进行相乘处理，以得到模型推理结果。On the basis of obtaining the first weight matrix, the first weight matrix is stored, and then in the model inference process, the first weight matrix and The input matrix is multiplied to obtain the model inference result.

图2是第一权重矩阵和输入矩阵的计算原理示意图，如图2所示，左图表示第一权重矩阵，每一方格表示一个权重，当方格中包括数值时，该方格表示一个有效权重，方格中的数值表示有效权重的权重值，当方格中不包括数值时，该方格表示一个无效权重，其权重值为零。如图2中右图表示输入矩阵，第一权重矩阵的列数等于输入矩阵的行数，如此可以实现将第一权重矩阵与输入矩阵进行相乘处理，得到模型推理结果。Figure 2 is a schematic diagram of the calculation principle of the first weight matrix and the input matrix. As shown in Figure 2, the left figure represents the first weight matrix, and each square represents a weight. When the square contains a numerical value, the square represents an effective weight. , the value in the square represents the weight value of the valid weight, when the square does not include a value, the square represents an invalid weight, and its weight value is zero. The right figure in Figure 2 represents the input matrix, and the number of columns of the first weight matrix is equal to the number of rows of the input matrix. In this way, the first weight matrix and the input matrix can be multiplied to obtain the model inference result.

在对目标模型的第一权重矩阵和输入矩阵进行相乘处理过程中，可以通过高速缓冲存储器cache，利用数据预取技术不断获取输入矩阵的相关数据，并可以采用嵌入式汇编指令进行第一权重矩阵的行与输入矩阵的列的向量乘法，以得到矩阵相乘结果。In the process of multiplying the first weight matrix of the target model and the input matrix, the relevant data of the input matrix can be continuously obtained by using the data prefetching technology through the cache memory, and the embedded assembly instruction can be used to perform the first weighting process. Vector multiplication of the rows of the matrix with the columns of the input matrix to obtain the matrix multiplication result.

该矩阵相乘结果中的数据可以累加一个偏置项，得到模型推理结果。之后可以将该模型推理结果传送至后处理模块，以实现图像处理。The data in the matrix multiplication result can accumulate a bias term to obtain the model inference result. The model inference results can then be sent to a post-processing module for image processing.

本实施例中，通过对目标模型的输入图片数据进行等价转换，得到输入矩阵；基于所述输入图片数据的数据量，确定所述输入矩阵的数据处理方式，所述数据处理方式用于表征所述输入矩阵的行数据相对于列数据的处理优先级；按照所述数据处理方式，对所述目标模型的第一权重矩阵和所述输入矩阵进行相乘处理，得到模型推理结果，所述第一权重矩阵基于预设稀疏度对所述目标模型训练获得的第二权重矩阵进行稀疏化处理得到。如此，可以提高模型压缩后的模型推理速度，从而提高模型推理的实时性。In this embodiment, an input matrix is obtained by performing equivalent transformation on the input picture data of the target model; based on the data amount of the input picture data, a data processing method of the input matrix is determined, and the data processing method is used to represent The processing priority of the row data of the input matrix relative to the column data; according to the data processing method, the first weight matrix of the target model and the input matrix are multiplied to obtain a model inference result, the The first weight matrix is obtained by performing sparse processing on the second weight matrix obtained by training the target model based on the preset sparsity. In this way, the model inference speed after model compression can be improved, thereby improving the real-time performance of the model inference.

可选的，所述步骤S102具体包括：Optionally, the step S102 specifically includes:

在所述输入图片数据的数据量大于预设阈值的情况下，确定所述数据处理方式为第一处理方式，所述第一处理方式用于表征所述输入矩阵的列数据的处理优先级大于行数据的处理优先级；In the case where the data amount of the input picture data is greater than a preset threshold, it is determined that the data processing mode is a first processing mode, and the first processing mode is used to represent the processing priority of the column data of the input matrix greater than The processing priority of row data;

在输入图片数据的尺寸小于或等于所述预设阈值的情况下，确定所述数据处理方式为第二处理方式，所述第二处理方式用于表征所述输入矩阵的行数据的处理优先级大于列数据的处理优先级。When the size of the input picture data is less than or equal to the preset threshold, determine that the data processing mode is a second processing mode, and the second processing mode is used to represent the processing priority of the row data of the input matrix Greater than the processing priority of column data.

本实施方式中，可以设置一预设阈值，如该预设阈值为128k，当数据量小于或等于128k时，可以表征输入图片数据的数据量比较少，确定输入矩阵的数据处理方式为先行后列，否则确定输入矩阵的数据处理方式为先列后行。如此，可以实现依据输入图片数据的数据量自适应进行输入矩阵的行列访问。In this embodiment, a preset threshold can be set. For example, the preset threshold is 128k. When the data amount is less than or equal to 128k, it can indicate that the data amount of the input picture data is relatively small, and the data processing method of the input matrix is determined to be first and then later. Columns, otherwise the data processing method of the input matrix is determined as columns first and then rows. In this way, the row and column access of the input matrix can be adaptively performed according to the data amount of the input picture data.

可选的，所述步骤S103具体包括：Optionally, the step S103 specifically includes:

在所述数据处理方式为所述第一处理方式的情况下，获取所述输入矩阵中的目标列数据，将所述第一权重矩阵中每一行数据分别与所述输入矩阵中所述目标列数据进行向量相乘处理，得到模型推理结果，所述目标列数据为所述输入矩阵中的至少部分列数据；When the data processing method is the first processing method, acquire target column data in the input matrix, and associate each row data in the first weight matrix with the target column in the input matrix respectively performing vector multiplication processing on the data to obtain a model inference result, and the target column data is at least part of the column data in the input matrix;

在所述数据处理方式为所述第二处理方式的情况下，针对所述第一权重矩阵中的目标行数据，获取所述输入矩阵中的所有列数据，将所述第一权重矩阵中所述目标行数据分别与所述第一权重矩阵中每一列数据进行向量相乘处理，得到模型推理结果，所述目标行数据为所述第一权重矩阵中任一行数据。When the data processing mode is the second processing mode, for the target row data in the first weight matrix, all column data in the input matrix are acquired, and all column data in the first weight matrix are The target row data is vector multiplied with each column of data in the first weight matrix to obtain a model inference result, and the target row data is any row data in the first weight matrix.

本实施方式中，在确定对输入矩阵的数据处理方式为先列后行的处理方式的情况下，可以先处理完列数据，再处理完行数据，即获取输入矩阵的一些列数据，针对这些列数据，将第一权重矩阵中每一行数据分别与输入矩阵中这些列数据进行向量相乘处理，得到模型推理结果中的一些数据。在第一权重矩阵与输入矩阵相乘处理未计算完成的情况下，继续获取输入矩阵未计算的列数据，针对这些列数据，将第一权重矩阵中每一行数据分别与输入矩阵中这些列数据进行向量相乘处理，得到模型推理结果中的另一些数据，直至第一权重矩阵与输入矩阵相乘处理计算完成，即最终完成了输入矩阵所有行数据的处理。如此可以实现第一权重矩阵和输入矩阵的矩阵相乘即模型推理计算。In this embodiment, when it is determined that the data processing method of the input matrix is the processing method of the column first and then the row, the column data can be processed first, and then the row data can be processed, that is, some column data of the input matrix are obtained. Column data, each row of data in the first weight matrix is vector multiplied with these column data in the input matrix, to obtain some data in the model inference result. In the case where the multiplication process of the first weight matrix and the input matrix is not completed, continue to obtain column data that has not been calculated in the input matrix, and for these column data, compare each row of data in the first weight matrix with the column data in the input matrix. Perform vector multiplication processing to obtain other data in the model inference result, until the multiplication processing of the first weight matrix and the input matrix is completed, that is, the processing of all row data of the input matrix is finally completed. In this way, the matrix multiplication of the first weight matrix and the input matrix, that is, the model inference calculation, can be realized.

图3是对输入矩阵进行先列后行的处理原理示意图，如图3所示，在获取到输入矩阵301(如图3所示等号左边的4×8的矩阵)中第1列至第4列的列数据(目标列数据3011)的情况下，分别将第一权重矩阵302(如图3中4×4的矩阵)中的每一行数据(分别用3021、3022、3023和3024表示)与输入矩阵中的目标列数据进行向量相乘处理，得到模型推理结果303(如图3所示等号右边的4×8的矩阵)中的一些数据，分别用3031、3032、3033和3034表示。其中，可以分别按照标识1、标识2、标识3和标识4(如图3中输入矩阵左边的数字标识)的计算原理进行先后处理。FIG. 3 is a schematic diagram of the processing principle of performing column first and then row processing on the input matrix. As shown in FIG. 3 , when the input matrix 301 (the 4×8 matrix on the left side of the equal sign shown in FIG. 3 ) is obtained, the first column to the first column In the case of 4 columns of column data (target column data 3011), each row of data (represented by 3021, 3022, 3023 and 3024 respectively) in the first weight matrix 302 (the 4×4 matrix in FIG. 3 ) is respectively Perform vector multiplication with the target column data in the input matrix to obtain some data in the model inference result 303 (the 4×8 matrix on the right side of the equal sign as shown in Figure 3), which are represented by 3031, 3032, 3033 and 3034 respectively. . Wherein, the processing can be performed successively according to the calculation principle of the ID 1, ID 2, ID 3 and ID 4 (the digital ID on the left side of the input matrix in FIG. 3).

在确定对输入矩阵的数据处理方式为先行后列的处理方式的情况下，可以先处理完行数据，再处理完列数据。针对第一权重矩阵中的一行数据，获取输入矩阵中的所有列数据，将第一权重矩阵中该行数据分别与第一权重矩阵中每一列数据进行向量相乘处理，得到模型推理结果中的一些数据。在第一权重矩阵与输入矩阵相乘处理未计算完成的情况下，继续针对第一权重矩阵中的另一行数据，将第一权重矩阵中该行数据分别与第一权重矩阵中每一列数据进行向量相乘处理，得到模型推理结果中的另一些数据，直至第一权重矩阵与输入矩阵相乘处理计算完成，即最终完成了输入矩阵所有行数据的处理。如此可以实现第一权重矩阵和输入矩阵的矩阵相乘即模型推理计算。In the case where it is determined that the data processing method for the input matrix is the row-first-column processing method, the row data may be processed first, and then the column data may be processed. For a row of data in the first weight matrix, obtain all column data in the input matrix, and perform vector multiplication of the row data in the first weight matrix with each column of data in the first weight matrix to obtain the model inference result. some data. In the case where the multiplication process of the first weight matrix and the input matrix is not completed, continue to perform another row of data in the first weight matrix with each column of data in the first weight matrix. The vector multiplication process is performed to obtain other data in the model inference result, until the multiplication process of the first weight matrix and the input matrix is completed, that is, the processing of all row data of the input matrix is finally completed. In this way, the matrix multiplication of the first weight matrix and the input matrix, that is, the model inference calculation, can be realized.

图4是对输入矩阵进行先行后列的处理原理示意图，如图4所示，标识1和标识2所表示的计算原理表示：针对第一权重矩阵401(如图4所示的4×4的矩阵)中第1行的行数据4011，将该行数据分别与输入矩阵402(如图4所示等号左边的4×8的矩阵)中每一列数据进行向量相乘处理，得到模型推理结果403(如图4所示等号右边的4×8的矩阵)中的一些数据，分别用4031和4032表示。标识3和标识4所表示的计算原理表示：针对第一权重矩阵401中第2行的行数据4012，将该行数据分别与输入矩阵402中每一列数据进行向量相乘处理，得到模型推理结果403中的另一些数据，分别用4033和4034表示。FIG. 4 is a schematic diagram of the processing principle of the first row and then the column of the input matrix. As shown in FIG. 4 , the calculation principle represented by the mark 1 and the mark 2 indicates that: for the first weight matrix 401 (4×4 shown in FIG. 4 ) The row data 4011 of the first row in the matrix), perform vector multiplication of the row data with each column data in the input matrix 402 (the 4×8 matrix on the left side of the equal sign as shown in Figure 4) to obtain the model inference result Some data in 403 (the 4×8 matrix on the right side of the equal sign shown in Figure 4) are represented by 4031 and 4032, respectively. The calculation principle represented by the identifier 3 and the identifier 4 indicates that: for the row data 4012 of the second row in the first weight matrix 401, the row data and each column data in the input matrix 402 are respectively subjected to vector multiplication processing to obtain the model inference result. Other data in 403 are represented by 4033 and 4034 respectively.

可选的，所述第一权重矩阵通过第一信息表征，所述第一信息包括所述第一权重矩阵中有效权重的位置信息、权重数据、以及每行的有效权重的数量信息，所述步骤S103具体包括：Optionally, the first weight matrix is represented by first information, and the first information includes position information of effective weights in the first weight matrix, weight data, and information on the number of effective weights in each row, the Step S103 specifically includes:

针对所述第一权重矩阵中每一行，基于所述第一权重矩阵中所述行的有效权重的数量信息，获取所述第一权重矩阵中所述行的有效权重的权重数据和位置信息；For each row in the first weight matrix, obtain weight data and position information of the effective weight of the row in the first weight matrix based on the quantity information of the effective weight of the row in the first weight matrix;

基于所述行的有效权重的位置信息，获取所述输入矩阵中每列对应所述位置信息的数据；Based on the position information of the effective weight of the row, obtain the data corresponding to the position information of each column in the input matrix;

将所述第一权重矩阵中所述行的有效权重的权重数据与所述输入矩阵中每列对应所述位置信息的数据进行向量相乘处理，得到模型推理结果。Perform vector multiplication processing on the weight data of the effective weight of the row in the first weight matrix and the data corresponding to the position information in each column of the input matrix to obtain a model inference result.

本实施方式中，为了减少目标模型的数据存储量，第一权重矩阵可以通过第一信息表征，所述第一信息包括所述第一权重矩阵中有效权重的位置信息、权重数据、以及每行的有效权重的数量信息，即可以仅存储第一权重矩阵中有效权重的相关信息。In this implementation manner, in order to reduce the data storage amount of the target model, the first weight matrix may be represented by first information, where the first information includes position information of effective weights in the first weight matrix, weight data, and each row The quantity information of the effective weights, that is, only the relevant information of the effective weights in the first weight matrix can be stored.

其中，第一权重矩阵中有效权重的位置信息可以通过每个有效权重在第一权重矩阵中的实际位置表示，如有效权重A位于第一行第一列。第一权重矩阵中有效权重的位置信息也可以通过按照预设方式确定的每相邻两个有效权重之间的位置差异表示，如按照列从左至右，行从上至下的排列顺序，将每相邻两个有效权重之间的位置差异进行存储。The position information of the effective weights in the first weight matrix can be represented by the actual position of each effective weight in the first weight matrix, for example, the effective weight A is located in the first row and the first column. The position information of the valid weights in the first weight matrix can also be represented by the position difference between each adjacent valid weights determined in a preset manner, for example, according to the arrangement order of columns from left to right and rows from top to bottom, The position difference between every two adjacent effective weights is stored.

相应的，第一权重矩阵和输入矩阵相乘时，可以按照第一权重矩阵中有效权重的位置，将第一权重矩阵与输入矩阵进行相乘。比如，针对第一权重矩阵中的第一行，可以分别将第一行的有效权重的权重数据与输入矩阵中每列对应有效权重位置的数据进行相乘并累加，以得到模型推理结果。Correspondingly, when the first weight matrix and the input matrix are multiplied, the first weight matrix and the input matrix may be multiplied according to the positions of the effective weights in the first weight matrix. For example, for the first row in the first weight matrix, the weight data of the effective weights in the first row and the data corresponding to the effective weight positions in each column in the input matrix can be multiplied and accumulated to obtain the model inference result.

在实现过程中，可以针对第一权重矩阵中每一行，基于第一权重矩阵中该行的有效权重的数量信息，获取第一权重矩阵中该行的有效权重的权重数据和位置信息，并将所述第一权重矩阵中该行的有效权重的权重数据与输入矩阵中每列对应所述位置信息的数据进行向量相乘处理，以得到模型推理结果。In the implementation process, for each row in the first weight matrix, based on the quantity information of the effective weight of the row in the first weight matrix, the weight data and position information of the effective weight of the row in the first weight matrix can be obtained, and the The weight data of the effective weight of the row in the first weight matrix and the data corresponding to the position information in each column of the input matrix are subjected to vector multiplication processing to obtain a model inference result.

如图2所示，针对第一权重矩阵中第一行，其有效权重的权重数据分别位于第1、3、4、5、7、9、10、12列，则可以将第一权重矩阵中该行的有效数据分别与输入矩阵中每列中第1、3、4、5、7、9、10、12行的数据进行向量相乘处理，以得到模型推理结果。As shown in Figure 2, for the first row in the first weight matrix, the weight data of its effective weights are located in the 1st, 3rd, 4th, 5th, 7th, 9th, 10th, and 12th columns, respectively, then the first weight matrix can be The valid data in this row is vector multiplied with the data in the 1st, 3rd, 4th, 5th, 7th, 9th, 10th, and 12th rows in each column of the input matrix to obtain the model inference result.

本实施方式中，通过依据第一权重矩阵中有效权重的数量信息、位置和权重数据完成与输入矩阵的矩阵相乘处理，即完成模型推理计算，可以降低目标模型的计算复杂度，进一步提高模型推理速度。In this embodiment, the matrix multiplication process of the input matrix is completed according to the quantity information, position and weight data of the effective weights in the first weight matrix, that is, the model inference calculation is completed, the computational complexity of the target model can be reduced, and the model can be further improved. Inference speed.

可选的，所述步骤S103之前，还包括：Optionally, before the step S103, it further includes:

基于数据缓存单位的大小和向量处理单位的大小，对所述输入矩阵针对行数据进行分块处理，得到多个数据处理块；Based on the size of the data cache unit and the size of the vector processing unit, the input matrix is subjected to block processing for row data to obtain a plurality of data processing blocks;

所述步骤S103具体包括：The step S103 specifically includes:

按照所述数据处理方式，对所述目标模型的第一权重矩阵中每一行数据和所述多个数据处理块所构建的列数据进行向量相乘处理，得到模型推理结果。According to the data processing method, vector multiplication is performed on each row of data in the first weight matrix of the target model and the column data constructed by the plurality of data processing blocks to obtain a model inference result.

本实施方式中，可以将输入矩阵进行行列分块处理，以适应中央处理器(CentralProcessing Unit，CPU)硬件cache大小和指令处理的数据大小，提升数据访问效率和数据计算效率。In this implementation manner, the input matrix can be processed in rows and columns to adapt to the size of the hardware cache of a central processing unit (Central Processing Unit, CPU) and the size of data processed by instructions, thereby improving data access efficiency and data calculation efficiency.

在一可选实施方式中，cache可以为L1 cache，其数据缓存单位即缓存行cacheline可以为64，为了提升数据访问效率，可以对输入矩阵针对行数据以64为单位进行分块处理，这样可以更快读取输入矩阵的数据。In an optional implementation manner, the cache may be an L1 cache, and its data cache unit, that is, the cache line cacheline, may be 64. In order to improve the data access efficiency, the input matrix may be processed in blocks of 64 for the row data. Faster reading of input matrix data.

另外，可以采用嵌入式汇编指令实现第一权重矩阵的行数据与输入矩阵的列数据的向量乘法，为了适配向量处理单位的大小，输入矩阵可以以向量处理单位的整数倍进行分块处理。比如，采用neon汇编指令处理数据时，其向量处理单位为8，输入矩阵可以以8的整数倍为单位进行分块处理。In addition, embedded assembly instructions can be used to implement vector multiplication of the row data of the first weight matrix and the column data of the input matrix. In order to adapt to the size of the vector processing unit, the input matrix can be divided into blocks by integer multiples of the vector processing unit. For example, when the neon assembly instruction is used to process data, the vector processing unit is 8, and the input matrix can be processed in blocks in units of integer multiples of 8.

为了适配输入矩阵的数据边界，在数据边界不足以以8进行分块处理的情况下，还可以以4为单位进行分块处理，或是还可以以1为单位进行分块处理。In order to adapt to the data boundary of the input matrix, if the data boundary is not enough to perform the block processing with 8, the block processing can also be performed in units of 4, or the block processing can be performed in units of 1.

比如，针对列数为125的一行数据，可以将其分成6个数据处理块，列数分别为64、32、16、8、4和1。For example, for a row of data with 125 columns, it can be divided into 6 data processing blocks, with 64, 32, 16, 8, 4, and 1 columns respectively.

相应的，可以通过cache，利用数据预取技术不断获取数据处理块，以获取输入矩阵的相关数据，并可以采用嵌入式汇编指令将第一权重矩阵的行数据与多个数据处理块所构建的列数据进行向量相乘处理，得到模型推理结果。如此，可以提升数据访问效率和数据计算效率，进一步提高模型推理的效率。Correspondingly, data processing blocks can be continuously obtained through cache and data prefetching technology to obtain relevant data of the input matrix, and embedded assembly instructions can be used to combine the row data of the first weight matrix with the data processing blocks constructed by multiple data processing blocks. The column data is multiplied by vectors to obtain the model inference result. In this way, data access efficiency and data calculation efficiency can be improved, and the efficiency of model inference can be further improved.

上述详细阐述了目标模型基于压缩后得到的第一权重矩阵进行模型推理的过程，以下详细阐述目标模型的压缩过程，即第一权重矩阵的获取过程。The process of model inference of the target model based on the first weight matrix obtained after compression is described in detail above. The compression process of the target model, that is, the acquisition process of the first weight matrix, is described in detail below.

可选的，所述方法还包括：Optionally, the method further includes:

获取所述目标模型训练后得到的第二信息，所述第二信息包括所述第二权重矩阵以及所述第二权重矩阵中每个有效权重的重要性因子；Obtain second information obtained after the target model is trained, the second information includes the second weight matrix and the importance factor of each effective weight in the second weight matrix;

基于预设稀疏度，按照所述第二权重矩阵中每个有效权重的重要性因子，对所述第二权重矩阵进行稀疏化处理，得到所述目标模型的第一权重矩阵。Based on the preset sparsity, the second weight matrix is sparsed according to the importance factor of each effective weight in the second weight matrix to obtain the first weight matrix of the target model.

本实施方式中，可以对目标模型进行非结构化剪枝。In this embodiment, unstructured pruning can be performed on the target model.

第二权重矩阵中每个有效权重的重要性因子可以表征有效权重在模型推理过程中的重要性，重要性因子越大，表征对模型推理结果的影响越大，即该有效权重越重要。The importance factor of each effective weight in the second weight matrix can represent the importance of the effective weight in the model inference process.

第二信息的获取方式可以包括多种，比如，可以采用迁移学习、自动化深度学习平台AutoDL、超参调优等技术训练目标模型，以在业务数据集上训练得到目标模型的权重值，得到第二权重矩阵，以及训练还可以得到第二权重矩阵中每个有效权重的重要性因子。又比如，可以接收其他电子设备训练目标模型后发送的第二信息。There are various ways to obtain the second information. For example, techniques such as migration learning, automated deep learning platform AutoDL, and hyperparameter tuning can be used to train the target model, so as to train on the business data set to obtain the weight value of the target model, and to obtain the first Two weight matrices, and training can also obtain the importance factor of each effective weight in the second weight matrix. For another example, the second information sent by other electronic devices after training the target model may be received.

在得到第二信息的情况下，可以按照第二权重矩阵中每个有效权重的重要性因子从大到小的顺序，对第二权重矩阵中有效权重进行排序，基于预设稀疏度，将排序在后的有效权重置为零，以对第二权重矩阵进行稀疏化处理，得到第一权重矩阵。比如，第二权重矩阵的稀疏度为40％，预设稀疏度为60％，则需要将第二权重矩阵中20％的有效权重置为零。In the case where the second information is obtained, the effective weights in the second weight matrix can be sorted in descending order of the importance factors of each effective weight in the second weight matrix, and based on the preset sparsity, the effective weights can be sorted. After the effective weight is reset to zero, the second weight matrix is sparsely processed to obtain the first weight matrix. For example, if the sparsity of the second weight matrix is 40% and the preset sparsity is 60%, 20% of the effective weights in the second weight matrix need to be reset to zero.

在得到第一权重矩阵的情况下，可以基于第一权重矩阵进行模型推理。When the first weight matrix is obtained, model inference can be performed based on the first weight matrix.

本实施方式中，通过基于预设稀疏度，按照所述第二权重矩阵中每个有效权重的重要性因子，对所述第二权重矩阵进行稀疏化处理，如此，可以实现非结构化剪枝，通过按照有效权重的重要性因子进行有效权重的裁剪，在保证模型推理精度的前提下，可以达到很高的模型压缩率。In this embodiment, based on the preset sparsity, the second weight matrix is sparsed according to the importance factor of each effective weight in the second weight matrix. In this way, unstructured pruning can be realized. , by cutting the effective weight according to the importance factor of the effective weight, under the premise of ensuring the accuracy of the model inference, a high model compression rate can be achieved.

可选的，所述第一权重矩阵通过第一信息表征，所述方法还包括：Optionally, the first weight matrix is characterized by first information, and the method further includes:

将所述第一权重矩阵中每个有效权重的权重值进行存储，得到权重数据；storing the weight value of each effective weight in the first weight matrix to obtain weight data;

将所述第一权重矩阵中按照预设方式确定的每相邻两个有效权重之间的位置差异进行存储，得到位置信息；storing the position difference between every two adjacent valid weights determined in the first weight matrix according to a preset method to obtain position information;

将所述第一权重矩阵中对应每个通道的有效权重数量进行存储，得到数量信息；storing the effective weight quantity corresponding to each channel in the first weight matrix to obtain quantity information;

所述第一信息包括所述权重数据、位置信息和所述数量信息。The first information includes the weight data, location information, and the quantity information.

本实施方式中，可以通过第一信息对第一权重矩阵进行存储。In this embodiment, the first weight matrix may be stored by using the first information.

第一信息可以通过不同数组存储第一权重矩阵中的有效权重，即第一信息可以包括多个数组，第一信息也可以通过数据对象来存储第一权重矩阵中的有效权重，即第一信息也可以为数据对象，这里不进行具体限定。The first information may store the effective weights in the first weight matrix through different arrays, that is, the first information may include multiple arrays, and the first information may also store the effective weights in the first weight matrix through a data object, that is, the first information It can also be a data object, which is not specifically limited here.

在一可选实施方式中，可以遍历第一权重矩阵，依次将有效权重的权重值，当前非零权重值和上一个非零权重值之间的地址差异以及对应通道的有效权重的权重数量存储在不同数组。In an optional embodiment, the first weight matrix may be traversed, and the weight value of the effective weight, the address difference between the current non-zero weight value and the previous non-zero weight value, and the weight quantity of the effective weight of the corresponding channel are sequentially stored. in different arrays.

图5是第一权重矩阵的数据存储格式示意图，如图5所示，左图为第一权重矩阵，可以按照行从左至右，列从上至下的顺序501对第一权重矩阵进行遍历，将当前非零权重值和上一个非零权重值之间的地址差异存储在“offset”数组中，将对应通道的有效权重的权重数量存储在“nonzero_num”数组中，将有效权重的权重值存储在“val”数组中。Fig. 5 is a schematic diagram of the data storage format of the first weight matrix. As shown in Fig. 5, the left picture is the first weight matrix. The first weight matrix can be traversed in the order 501 of rows from left to right and columns from top to bottom , store the address difference between the current non-zero weight value and the previous non-zero weight value in the "offset" array, store the number of effective weights of the corresponding channel in the "nonzero_num" array, and store the weight value of the effective weight in the "nonzero_num" array. Stored in the "val" array.

其中，若当前非零权重值对应的行标大于上一个非零权重值的行标，则位置差异大于0，若当前非零权重值对应的行标小于上一个非零权重值的行标，则位置差异小于0，若当前非零权重值对应的行标等于上一个非零权重值的行标，则位置差异等于0。Among them, if the row label corresponding to the current non-zero weight value is greater than the row label of the previous non-zero weight value, the position difference is greater than 0, and if the row label corresponding to the current non-zero weight value is smaller than the row label of the previous non-zero weight value, The position difference is less than 0. If the row label corresponding to the current non-zero weight value is equal to the row label of the previous non-zero weight value, the position difference is equal to 0.

另外，可以存储第一权重矩阵中第一个有效权重的实际位置，以基于该实际位置和“offset”数组，可以确定第一权重矩阵中每个有效权重的实际位置。Additionally, the actual position of the first effective weight in the first weight matrix can be stored, so that based on the actual position and the "offset" array, the actual position of each effective weight in the first weight matrix can be determined.

本实施方式中，通过第一信息存储第一权重矩阵中有效权重的相关信息，并通过存储第一权重矩阵中每相邻两个有效权重的位置差异，如此可以降低模型的数据存储量，进一步提高模型的压缩率。In this embodiment, the relevant information of the effective weights in the first weight matrix is stored by the first information, and the position difference of every two adjacent effective weights in the first weight matrix is stored, so that the data storage capacity of the model can be reduced, and the data storage capacity of the model can be further reduced. Improve the compression ratio of the model.

可选的，所述基于预设稀疏度，按照所述第二权重矩阵中每个有效权重的重要性因子，对所述第二权重矩阵进行稀疏化处理，得到所述目标模型的第一权重矩阵，包括：Optionally, based on the preset sparsity, according to the importance factor of each effective weight in the second weight matrix, sparse the second weight matrix to obtain the first weight of the target model. matrix, including:

基于预设稀疏度，按照所述第二权重矩阵中每个有效权重的重要性因子，对所述第二权重矩阵进行稀疏化处理，得到第三权重矩阵；Based on the preset sparsity, according to the importance factor of each effective weight in the second weight matrix, sparse the second weight matrix to obtain a third weight matrix;

对所述第三权重矩阵进行量化处理，得到所述第一权重矩阵。Quantizing the third weight matrix to obtain the first weight matrix.

本实施方式中，在进行稀疏化处理得到第三权重矩阵后，可以对第三权重矩阵中有效权重进行量化，通过量化可以得到整型低比特的第一权重矩阵，如此可以更进一步降低模型的数据存储量，提高模型的压缩率。In this embodiment, after the third weight matrix is obtained by sparse processing, the effective weights in the third weight matrix can be quantized, and the first weight matrix with low bits of integer type can be obtained through quantization, which can further reduce the complexity of the model. The amount of data storage and the compression ratio of the model are improved.

第二实施例Second Embodiment

如图6所示，本公开提供一种模型推理装置600，包括：As shown in FIG. 6, the present disclosure provides a model inference apparatus 600, including:

转换模块601，用于对目标模型的输入图片数据进行等价转换，得到输入矩阵；The conversion module 601 is used to perform equivalent conversion on the input picture data of the target model to obtain an input matrix;

确定模块602，用于基于所述输入图片数据的数据量，确定所述输入矩阵的数据处理方式，所述数据处理方式用于表征所述输入矩阵的行数据相对于列数据的处理优先级；A determination module 602, configured to determine a data processing mode of the input matrix based on the data amount of the input picture data, where the data processing mode is used to represent the processing priority of the row data of the input matrix relative to the column data;

相乘处理模块603，用于按照所述数据处理方式，对所述目标模型的第一权重矩阵和所述输入矩阵进行相乘处理，得到模型推理结果，所述第一权重矩阵基于预设稀疏度对所述目标模型训练获得的第二权重矩阵进行稀疏化处理得到。A multiplication processing module 603, configured to perform multiplication processing on the first weight matrix of the target model and the input matrix according to the data processing method to obtain a model inference result, where the first weight matrix is based on a preset sparseness The second weight matrix obtained by training the target model is obtained by sparse processing.

可选的，所述确定模块602包括：Optionally, the determining module 602 includes:

第一确定单元，用于在所述输入图片数据的数据量大于预设阈值的情况下，确定所述数据处理方式为第一处理方式，所述第一处理方式用于表征所述输入矩阵的列数据的处理优先级大于行数据的处理优先级；A first determination unit, configured to determine that the data processing mode is a first processing mode when the data amount of the input picture data is greater than a preset threshold, and the first processing mode is used to characterize the input matrix. The processing priority of column data is greater than that of row data;

第二确定单元，用于在输入图片数据的尺寸小于或等于所述预设阈值的情况下，确定所述数据处理方式为第二处理方式，所述第二处理方式用于表征所述输入矩阵的行数据的处理优先级大于列数据的处理优先级。a second determining unit, configured to determine that the data processing mode is a second processing mode when the size of the input picture data is less than or equal to the preset threshold, and the second processing mode is used to represent the input matrix The processing priority of row data is higher than that of column data.

可选的，所述相乘处理模块603，具体用于：Optionally, the multiplication processing module 603 is specifically used for:

可选的，所述第一权重矩阵通过第一信息表征，所述第一信息包括所述第一权重矩阵中有效权重的位置信息、权重数据、以及每行的有效权重的数量信息，所述相乘处理模块603，具体用于：Optionally, the first weight matrix is represented by first information, and the first information includes position information of effective weights in the first weight matrix, weight data, and information on the number of effective weights in each row, the The multiplication processing module 603 is specifically used for:

可选的，所述装置还包括：Optionally, the device further includes:

分块处理模块，用于基于数据缓存单位的大小和向量处理单位的大小，对所述输入矩阵针对行数据进行分块处理，得到多个数据处理块；a block processing module for performing block processing on the row data based on the size of the data cache unit and the size of the vector processing unit to obtain a plurality of data processing blocks;

所述相乘处理模块603，具体用于按照所述数据处理方式，对所述目标模型的第一权重矩阵中每一行数据和所述多个数据处理块所构建的列数据进行向量相乘处理，得到模型推理结果。The multiplication processing module 603 is specifically configured to perform vector multiplication processing on each row data in the first weight matrix of the target model and the column data constructed by the multiple data processing blocks according to the data processing method , get the model inference result.

可选的，所述装置还包括：Optionally, the device further includes:

获取模块，用于获取所述目标模型训练后得到的第二信息，所述第二信息包括所述第二权重矩阵以及所述第二权重矩阵中每个有效权重的重要性因子；an acquisition module, configured to acquire second information obtained after the target model is trained, the second information includes the second weight matrix and the importance factor of each effective weight in the second weight matrix;

稀疏化处理模块，用于基于预设稀疏度，按照所述第二权重矩阵中每个有效权重的重要性因子，对所述第二权重矩阵进行稀疏化处理，得到所述目标模型的第一权重矩阵。A sparse processing module, configured to perform sparse processing on the second weight matrix according to the importance factor of each effective weight in the second weight matrix based on the preset sparsity, to obtain the first value of the target model. weight matrix.

可选的，所述第一权重矩阵通过第一信息表征，所述装置还包括：Optionally, the first weight matrix is characterized by first information, and the device further includes:

第一存储模块，用于将所述第一权重矩阵中每个有效权重的权重值进行存储，得到权重数据；a first storage module, configured to store the weight value of each effective weight in the first weight matrix to obtain weight data;

第二存储模块，用于将所述第一权重矩阵中按照预设方式确定的每相邻两个有效权重之间的位置差异进行存储，得到位置信息；a second storage module, configured to store the position difference between each two adjacent effective weights determined in the first weight matrix in a preset manner to obtain position information;

第三存储模块，用于将所述第一权重矩阵中对应每个通道的有效权重数量进行存储，得到数量信息；a third storage module, configured to store the effective weight quantity corresponding to each channel in the first weight matrix to obtain quantity information;

可选的，所述稀疏化处理模块，具体用于：Optionally, the thinning processing module is specifically used for:

本公开提供的模型推理装置600能够实现模型推理方法实施例实现的各个过程，且能够达到相同的有益效果，为避免重复，这里不再赘述。The model inference apparatus 600 provided by the present disclosure can implement the various processes implemented by the model inference method embodiments, and can achieve the same beneficial effects. To avoid repetition, details are not described here.

本公开的技术方案中，所涉及的用户个人信息的收集、存储、使用、加工、传输、提供和公开等处理，均符合相关法律法规的规定，且不违背公序良俗。In the technical solutions of the present disclosure, the collection, storage, use, processing, transmission, provision, and disclosure of the user's personal information involved are all in compliance with relevant laws and regulations, and do not violate public order and good customs.

根据本公开的实施例，本公开还提供了一种电子设备、一种可读存储介质和一种计算机程序产品。According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.

图7示出了可以用来实施本公开的实施例的示例电子设备的示意性框图。电子设备旨在表示各种形式的数字计算机，诸如，膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置，诸如，个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例，并且不意在限制本文中描述的和/或者要求的本公开的实现。7 shows a schematic block diagram of an example electronic device that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

如图7所示，电子设备700包括计算单元701，其可以根据存储在只读存储器(ROM)702中的计算机程序或者从存储单元708加载到随机访问存储器(RAM)703中的计算机程序，来执行各种适当的动作和处理。在RAM 703中，还可存储设备700操作所需的各种程序和数据。计算单元701、ROM 702以及RAM 703通过总线704彼此相连。输入/输出(I/O)接口705也连接至总线704。As shown in FIG. 7 , the electronic device 700 includes a computing unit 701 that can be programmed according to a computer program stored in a read only memory (ROM) 702 or loaded into a random access memory (RAM) 703 from a storage unit 708 . Various appropriate actions and processes are performed. In the RAM 703, various programs and data necessary for the operation of the device 700 can also be stored. The computing unit 701 , the ROM 702 , and the RAM 703 are connected to each other through a bus 704 . An input/output (I/O) interface 705 is also connected to bus 704 .

电子设备700中的多个部件连接至I/O接口705，包括：输入单元706，例如键盘、鼠标等；输出单元707，例如各种类型的显示器、扬声器等；存储单元708，例如磁盘、光盘等；以及通信单元709，例如网卡、调制解调器、无线通信收发机等。通信单元709允许设备700通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。Various components in the electronic device 700 are connected to the I/O interface 705, including: an input unit 706, such as a keyboard, a mouse, etc.; an output unit 707, such as various types of displays, speakers, etc.; a storage unit 708, such as a magnetic disk, an optical disk, etc. etc.; and a communication unit 709, such as a network card, modem, wireless communication transceiver, and the like. The communication unit 709 allows the device 700 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.

计算单元701可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元701的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元701执行上文所描述的各个方法和处理，例如模型推理方法。例如，在一些实施例中，模型推理方法可被实现为计算机软件程序，其被有形地包含于机器可读介质，例如存储单元708。在一些实施例中，计算机程序的部分或者全部可以经由ROM 702和/或通信单元709而被载入和/或安装到设备700上。当计算机程序加载到RAM 703并由计算单元701执行时，可以执行上文描述的模型推理方法的一个或多个步骤。备选地，在其他实施例中，计算单元701可以通过其他任何适当的方式(例如，借助于固件)而被配置为执行模型推理方法。Computing unit 701 may be various general-purpose and/or special-purpose processing components with processing and computing capabilities. Some examples of computing units 701 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various specialized artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the various methods and processes described above, such as the model inference method. For example, in some embodiments, the model inference method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 708 . In some embodiments, part or all of the computer program may be loaded and/or installed on device 700 via ROM 702 and/or communication unit 709 . When a computer program is loaded into RAM 703 and executed by computing unit 701, one or more steps of the model inference method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the model inference method by any other suitable means (eg, by means of firmware).

本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括：实施在一个或者多个计算机程序中，该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释，该可编程处理器可以是专用或者通用可编程处理器，可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令，并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described herein above may be implemented in digital electronic circuitry, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips system (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpretable on a programmable system including at least one programmable processor that The processor, which may be a special purpose or general-purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device an output device.

用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器，使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行，作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, performs the functions/functions specified in the flowcharts and/or block diagrams. Action is implemented. The program code may execute entirely on the machine, partly on the machine, partly on the machine and partly on a remote machine as a stand-alone software package or entirely on the remote machine or server.

在本公开的上下文中，机器可读介质可以是有形的介质，其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备，或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.

为了提供与用户的交互，可以在计算机上实施此处描述的系统和技术，该计算机具有：用于向用户显示信息的显示装置(例如，CRT(阴极射线管)或者LCD(液晶显示器)监视器)；以及键盘和指向装置(例如，鼠标或者轨迹球)，用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互；例如，提供给用户的反馈可以是任何形式的传感反馈(例如，视觉反馈、听觉反馈、或者触觉反馈)；并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide interaction with a user, the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or trackball) through which a user can provide input to the computer. Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (eg, visual feedback, auditory feedback, or tactile feedback); and can be in any form (including acoustic input, voice input, or tactile input) to receive input from the user.

可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如，作为数据服务器)、或者包括中间件部件的计算系统(例如，应用服务器)、或者包括前端部件的计算系统(例如，具有图形用户界面或者网络浏览器的用户计算机，用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如，通信网络)来将系统的部件相互连接。通信网络的示例包括：局域网(LAN)、广域网(WAN)和互联网。The systems and techniques described herein may be implemented on a computing system that includes back-end components (eg, as a data server), or a computing system that includes middleware components (eg, an application server), or a computing system that includes front-end components (eg, a user computer having a graphical user interface or web browser through which a user may interact with implementations of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system may be interconnected by any form or medium of digital data communication (eg, a communication network). Examples of communication networks include: Local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器，也可以为分布式系统的服务器，或者是结合了区块链的服务器。A computer system can include clients and servers. Clients and servers are generally remote from each other and usually interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, a distributed system server, or a server combined with blockchain.

应该理解，可以使用上面所示的各种形式的流程，重新排序、增加或删除步骤。例如，本公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行，只要能够实现本公开公开的技术方案所期望的结果，本文在此不进行限制。It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, the steps described in the present disclosure can be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, no limitation is imposed herein.

上述具体实施方式，并不构成对本公开保护范围的限制。本领域技术人员应该明白的是，根据设计要求和其他因素，可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等，均应包含在本公开保护范围之内。The above-mentioned specific embodiments do not constitute a limitation on the protection scope of the present disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may occur depending on design requirements and other factors. Any modifications, equivalent replacements, and improvements made within the spirit and principles of the present disclosure should be included within the protection scope of the present disclosure.

Claims

1. A method of model inference, comprising:

performing equivalent transformation on input picture data of a target model to obtain an input matrix;

determining a data processing mode of the input matrix based on the data volume of the input picture data, wherein the data processing mode is used for representing the processing priority of row data relative to column data of the input matrix;

and according to the data processing mode, multiplying the first weight matrix of the target model and the input matrix to obtain a model reasoning result, wherein the first weight matrix is obtained by performing sparsification processing on a second weight matrix obtained by training the target model based on preset sparsity.

2. The method of claim 1, wherein the determining a data processing manner of the input matrix based on the data amount of the input picture data comprises:

determining that the data processing mode is a first processing mode under the condition that the data volume of the input picture data is larger than a preset threshold value, wherein the first processing mode is used for representing that the processing priority of the column data of the input matrix is larger than the processing priority of the row data;

and under the condition that the size of the input picture data is smaller than or equal to the preset threshold, determining that the data processing mode is a second processing mode, wherein the second processing mode is used for representing that the processing priority of the row data of the input matrix is larger than the processing priority of the column data.

3. The method according to claim 2, wherein said multiplying the first weight matrix of the target model and the input matrix according to the data processing manner to obtain a model inference result comprises:

under the condition that the data processing mode is the first processing mode, acquiring target column data in the input matrix, and performing vector multiplication processing on each row of data in the first weight matrix and the target column data in the input matrix respectively to obtain a model reasoning result, wherein the target column data is at least part of column data in the input matrix;

and under the condition that the data processing mode is the second processing mode, acquiring all column data in the input matrix aiming at target row data in the first weight matrix, and performing vector multiplication processing on the target row data in the first weight matrix and each column data in the first weight matrix respectively to obtain a model reasoning result, wherein the target row data is any row data in the first weight matrix.

4. The method of claim 1, wherein the first weight matrix is characterized by first information, the first information includes position information of effective weights in the first weight matrix, weight data, and number information of effective weights per row, and the multiplying process is performed on the first weight matrix of the target model and the input matrix to obtain a model inference result, and includes:

for each row in the first weight matrix, acquiring weight data and position information of the effective weights of the row in the first weight matrix based on the quantity information of the effective weights of the row in the first weight matrix;

acquiring data of each column in the input matrix corresponding to the position information based on the position information of the effective weight of the row;

and carrying out vector multiplication on the weight data of the effective weight of the row in the first weight matrix and the data of each column corresponding to the position information in the input matrix to obtain a model reasoning result.

5. The method according to claim 1, wherein before the multiplying the first weight matrix of the target model and the input matrix according to the data processing manner to obtain the model inference result, the method further comprises:

based on the size of a data cache unit and the size of a vector processing unit, performing block processing on the input matrix aiming at row data to obtain a plurality of data processing blocks;

the multiplying the first weight matrix of the target model and the input matrix according to the data processing mode to obtain a model inference result, including:

and according to the data processing mode, carrying out vector multiplication processing on each row of data in the first weight matrix of the target model and the column data constructed by the plurality of data processing blocks to obtain a model reasoning result.

6. The method of claim 1, further comprising:

acquiring second information obtained after the target model is trained, wherein the second information comprises the second weight matrix and an importance factor of each effective weight in the second weight matrix;

and based on preset sparsity, performing sparsification processing on the second weight matrix according to the importance factor of each effective weight in the second weight matrix to obtain a first weight matrix of the target model.

7. The method of claim 6, wherein the first weight matrix is characterized by first information, the method further comprising:

storing the weight value of each effective weight in the first weight matrix to obtain weight data;

storing the position difference between every two adjacent effective weights determined according to a preset mode in the first weight matrix to obtain position information;

storing the effective weight quantity corresponding to each channel in the first weight matrix to obtain quantity information;

the first information includes the weight data, location information, and the quantity information.

8. The method of claim 6, wherein the obtaining the first weight matrix of the target model by performing sparsification on the second weight matrix according to an importance factor of each effective weight in the second weight matrix based on a preset sparsity degree comprises:

based on a preset sparsity, performing sparsification processing on the second weight matrix according to the importance factor of each effective weight in the second weight matrix to obtain a third weight matrix;

and quantizing the third weight matrix to obtain the first weight matrix.

9. A model inference apparatus, comprising:

the conversion module is used for carrying out equivalent conversion on input picture data of the target model to obtain an input matrix;

the determining module is used for determining a data processing mode of the input matrix based on the data volume of the input picture data, wherein the data processing mode is used for representing the processing priority of row data relative to column data of the input matrix;

and the multiplication processing module is used for multiplying the first weight matrix of the target model and the input matrix according to the data processing mode to obtain a model reasoning result, and the first weight matrix is obtained by performing sparsification processing on a second weight matrix obtained by training the target model based on a preset sparsity.

10. The apparatus of claim 9, wherein the means for determining comprises:

the first determining unit is used for determining that the data processing mode is a first processing mode under the condition that the data volume of the input picture data is larger than a preset threshold value, wherein the first processing mode is used for representing that the processing priority of column data of the input matrix is larger than the processing priority of row data;

and the second determining unit is used for determining that the data processing mode is a second processing mode under the condition that the size of the input picture data is smaller than or equal to the preset threshold, wherein the second processing mode is used for representing that the processing priority of the row data of the input matrix is larger than the processing priority of the column data.

11. The apparatus according to claim 10, wherein the multiplication processing module is specifically configured to:

12. The apparatus according to claim 9, wherein the first weight matrix is characterized by first information, the first information includes position information of effective weights in the first weight matrix, weight data, and number information of effective weights per row, and the multiplication processing module is specifically configured to:

13. The apparatus of claim 9, further comprising:

the block processing module is used for carrying out block processing on the input matrix aiming at the row data based on the size of a data cache unit and the size of a vector processing unit to obtain a plurality of data processing blocks;

and the multiplication processing module is specifically configured to perform vector multiplication processing on each row of data in the first weight matrix of the target model and the column data constructed by the plurality of data processing blocks according to the data processing mode to obtain a model inference result.

14. The apparatus of claim 9, further comprising:

an obtaining module, configured to obtain second information obtained after the target model is trained, where the second information includes the second weight matrix and an importance factor of each effective weight in the second weight matrix;

and the sparsification processing module is used for sparsifying the second weight matrix based on a preset sparsity according to the importance factor of each effective weight in the second weight matrix to obtain a first weight matrix of the target model.

15. The apparatus of claim 14, wherein the first weight matrix is characterized by first information, the apparatus further comprising:

the first storage module is used for storing the weight value of each effective weight in the first weight matrix to obtain weight data;

the second storage module is used for storing the position difference between every two adjacent effective weights determined according to a preset mode in the first weight matrix to obtain position information;

the third storage module is used for storing the effective weight quantity corresponding to each channel in the first weight matrix to obtain quantity information;

16. The apparatus according to claim 14, wherein the sparsification processing module is specifically configured to:

and quantizing the third weight matrix to obtain the first weight matrix.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.

19. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-8.