CN109918951B

CN109918951B - Artificial intelligence processor side channel defense system based on interlayer fusion

Info

Publication number: CN109918951B
Application number: CN201910183870.XA
Authority: CN
Inventors: 侯锐; 王兴宾; 孟丹
Original assignee: Institute of Information Engineering of CAS
Current assignee: Institute of Information Engineering of CAS
Priority date: 2019-03-12
Filing date: 2019-03-12
Publication date: 2020-09-01
Anticipated expiration: 2039-03-12
Also published as: CN109918951A

Abstract

The invention discloses an artificial intelligence processor side channel defense system based on interlayer fusion, which is composed of a general artificial intelligence processor architecture, a fusion control unit, a global on-chip cache unit, and a stripe fusion unit; On the basis of the artificial intelligence processor, a fusion control unit and a global on-chip cache are added, and the neural network model is fused by combining the strip fusion method and fusion instruction, so that the artificial intelligence processor can achieve higher performance and stronger security; the present invention It has novel structure, strong adaptability, good performance and high security, and can be applied to the security protection of existing artificial intelligence processors, model protection of neural networks, etc., and has wide practical value and application prospects.

Description

An artificial intelligence processor side channel defense system based on inter-layer fusion

技术领域technical field

本发明涉及一种基于层间融合的人工智能处理器侧信道防御系统，应用于人工智能处理器片外DRAM侧信道的防御，属于人工智能处理器的安全领域。The invention relates to an artificial intelligence processor side channel defense system based on interlayer fusion, which is applied to the defense of an artificial intelligence processor off-chip DRAM side channel, and belongs to the security field of artificial intelligence processors.

背景技术Background technique

近年来，人工智能技术已经在许多商业领域得到了广泛的应用，比如图像识别、语音识别、图像检索等领域。由于深度学习算法需要强大的计算力，使得越来越多的研究者投入到了深度学习加速器的研究中。为了设计出高性能、低功耗、实时性的深度学习加速器，研究者从微体系结构、电路、材料等等各个方面进行研究。2014年，中科院计算所陈云霁研究员率先设计出第一款深度学习加速器DianNao，该加速器由计算单元、控制单元、存储单元组成，为了设计出更通用的人工智能处理器，他们发布第一套神经网络指令集。通过指令集的编译来实现对各类深度学习算法的兼容，以达到更好的加速效果。2017年，麻省理工学院提出了Eyrriss深度学习加速器，该加速器采用行数据流方法进行深度学习加速进行加速。谷歌也推出了其神经网络张量处理器TPU，并应用于其公司内部服务器中使用。2018年5月谷歌推出了TPU 3.0，计算性能相比TPU2.0提升了8倍，可达1000万亿次浮点计算。In recent years, artificial intelligence technology has been widely used in many commercial fields, such as image recognition, speech recognition, image retrieval and other fields. Since deep learning algorithms require powerful computing power, more and more researchers have invested in the research of deep learning accelerators. In order to design a high-performance, low-power, real-time deep learning accelerator, researchers conduct research from various aspects such as micro-architecture, circuits, and materials. In 2014, researcher Chen Yunji of the Institute of Computing Technology of the Chinese Academy of Sciences took the lead in designing the first deep learning accelerator DianNao, which consists of a computing unit, a control unit and a storage unit. In order to design a more general artificial intelligence processor, they released the first set of neural Network instruction set. Compatibility with various deep learning algorithms is achieved through the compilation of instruction sets to achieve better acceleration effects. In 2017, MIT proposed the Eyrriss deep learning accelerator, which uses the row data flow method for deep learning acceleration for acceleration. Google has also launched its neural network tensor processor TPU, which is used in its internal servers. In May 2018, Google launched TPU 3.0, which has an 8-fold increase in computing performance compared to TPU 2.0, up to 1,000 trillion floating-point calculations.

由于神经网络模型的推理需要高性能、低功耗的专用硬件来执行，所以越来越多的神经网络模型被部署到人工智能处理器上运行，来提高运行效率和实时性。另外，在许多应用场景中神经网络的模型是需要被保护的，包括神经网络模型的结构和权重。比如公司依靠神经网络提供有价值的增值服务，或者提供神经网络模型的功能服务。那该神经网络模型就是公司重要的知识产权。Since the inference of neural network models requires dedicated hardware with high performance and low power consumption, more and more neural network models are deployed to run on artificial intelligence processors to improve operational efficiency and real-time performance. In addition, the neural network model needs to be protected in many application scenarios, including the structure and weight of the neural network model. For example, companies rely on neural networks to provide valuable value-added services, or to provide functional services for neural network models. Then the neural network model is an important intellectual property of the company.

最近，相关文献表明人工智能处理器正受到侧信道攻击，包括内存侧信道攻击和时间侧信道攻击。并且通过该攻击方法能够得到神经网络模型的结构，再利用神经网络模型通常会采用剪枝技术的漏洞来偷取神经网络模型的权重，并且指令型人工智能处理器使得攻击者能够从得到的指令中恢复出神经网络模型的结构，以及得知神经网络模型被存储的位置。人工智能处理器面临诸多被攻击的挑战，所以亟需一种安全的防御方法来保护人工智能处理器。Recently, related literature has shown that AI processors are being subjected to side-channel attacks, including memory side-channel attacks and temporal side-channel attacks. And through this attack method, the structure of the neural network model can be obtained, and then the neural network model usually uses the vulnerability of the pruning technology to steal the weight of the neural network model, and the instruction-based artificial intelligence processor enables the attacker to obtain the instruction from the obtained instruction. It recovers the structure of the neural network model and knows where the neural network model is stored. Artificial intelligence processors face many challenges to be attacked, so a secure defense method is urgently needed to protect artificial intelligence processors.

目前，国内外有过对神经网络层间融合处理的方法研究，但是这些研究仅仅是为了增加人工智能处理器的性能，并没用从安全的角度去进行设计，本发明中采用条带融合方法并结合定制的融合指令来增强人工智能处理器的安全性和提高它的性能。At present, there have been researches on the fusion processing method between neural network layers at home and abroad, but these researches are only to increase the performance of the artificial intelligence processor, and do not need to be designed from the perspective of security. The strip fusion method is adopted in the present invention. And combined with customized fusion instructions to enhance the security of the AI processor and improve its performance.

发明内容SUMMARY OF THE INVENTION

本发明技术解决问题：克服现有技术的不足，提供一种基于层间融合的人工智能处理器侧信道防御系统，减少人工智能处理器的内存侧信道信息泄露，减少人工智能处理器与外部内存DRAM之前的数据交互，提高了人工智能处理器的安全性，具有高性能、安全性高、便捷的特点。The technology of the present invention solves the problem: overcomes the deficiencies of the prior art, provides an artificial intelligence processor side channel defense system based on inter-layer fusion, reduces the leakage of memory side channel information of the artificial intelligence processor, and reduces the artificial intelligence processor and external memory. The data interaction before DRAM improves the security of artificial intelligence processors, and has the characteristics of high performance, high security and convenience.

本发明技术解决方案：The technical solution of the present invention:

一种基于层间融合的人工智能处理器侧信道防御系统，包括：通用人工智能处理器架构、融合控制单元、全局片上缓存单元和条带融合单元；在通用人工智能处理器架构的基础上添加了融合控制单元，并定制了人工智能处理器的神经网络层间融合指令，融合控制单元结合融合指令实现对神经网络各个层进行融合处理；在通用人工智能处理器架构上添加全局片上缓存单元，用来缓存人工智能处理器处理的中间数据，所述中间数据包括输入特征图和输出特征图；采用条带融合方法配合融合控制单元对神经网络的各个层进行融合处理，减少内存侧信道信息泄露，混淆了攻击者推断神经网络模型的结构，提高了人工智能处理器的安全性能。An artificial intelligence processor side channel defense system based on inter-layer fusion, comprising: a general artificial intelligence processor architecture, a fusion control unit, a global on-chip cache unit and a stripe fusion unit; adding on the basis of the general artificial intelligence processor architecture The fusion control unit is designed, and the fusion instruction between the neural network layers of the artificial intelligence processor is customized. The fusion control unit combines the fusion instruction to realize the fusion processing of each layer of the neural network; the global on-chip cache unit is added to the general artificial intelligence processor architecture. It is used to cache the intermediate data processed by the artificial intelligence processor, and the intermediate data includes the input feature map and the output feature map; the strip fusion method is used to cooperate with the fusion control unit to fuse each layer of the neural network to reduce the leakage of memory side channel information. , which confuses the attacker to infer the structure of the neural network model and improves the security performance of the artificial intelligence processor.

融合控制单元是由融合控制逻辑和融合指令解析逻辑组成，融合控制逻辑主要控制人工智能处理器进行中间数据融合的操作；融合指令解析逻辑主要用来解析融合指令发送给融合控制逻辑。The fusion control unit is composed of fusion control logic and fusion instruction parsing logic. The fusion control logic mainly controls the artificial intelligence processor to perform intermediate data fusion operations; the fusion instruction parsing logic is mainly used to parse the fusion instruction and send it to the fusion control logic.

所述条带融合单元采用条带融合方法和融合指令实现，具体过程为：把输入特征图通过常用的条带划分方法以及卷积核大小来确定条带重合部分进行条带划分，再进行对应条带按条带融合方法结合定制的神经网络层间融合指令对神经网络模型进行层间融合处理，以增强人工智能处理器的安全性，提高人工智能处理器的性能。The strip fusion unit is implemented by a strip fusion method and a fusion instruction, and the specific process is as follows: the input feature map is determined by the commonly used strip division method and the size of the convolution kernel to determine the overlapping part of the strip to divide the strip, and then perform the corresponding strip division. The stripe-by-stripe fusion method combines the customized neural network layer-to-layer fusion instructions to perform inter-layer fusion processing on the neural network model to enhance the security of the artificial intelligence processor and improve the performance of the artificial intelligence processor.

所述条带融合单元在实现条带融合方法时，在相邻条带之间必须要有数据重叠的部分；数据重叠部分的行数是由卷积核的大小决定的，具体公式如下所示：When the strip fusion unit implements the strip fusion method, there must be a data overlapping part between adjacent strips; the number of rows of the data overlapping part is determined by the size of the convolution kernel, and the specific formula is as follows :

D＝K-1D=K-1

D为数据重叠部分的行数，K为卷积核的大小。D is the number of rows in the overlapping part of the data, and K is the size of the convolution kernel.

本发明与现有技术相比的优点在于：The advantages of the present invention compared with the prior art are:

(1)本发明采用层间融合的策略，减少了人工智能处理器的内存侧信道信息泄露和人工智能处理器与外部内存DRAM之前的数据交互，并且能够隐藏深度神经网络层与层之间的边界，针对人工智能处理器片外内存DRAM侧信道信息泄露问题，进行了有效的防御，提高了人工智能处理器的安全性，具有高性能、安全性高、便捷的优点。(1) The present invention adopts the strategy of inter-layer fusion, which reduces the information leakage of the memory side channel of the artificial intelligence processor and the data interaction between the artificial intelligence processor and the external memory DRAM, and can hide the deep neural network layer between layers. Boundary has effectively defended against the information leakage problem of the DRAM side channel of the off-chip memory of the artificial intelligence processor, which improves the security of the artificial intelligence processor, and has the advantages of high performance, high security and convenience.

(2)本发明能够有效的减少人工智能处理器与片外DRAM之间的数据搬移，并且能够消除融合层内层与层之间的边界，使得攻击无法通过内存侧信道信息泄露来推断出神经网络模型的结构。本发明可广泛用于人工智能处理器的安全保卫、AIoT安防终端等领域，具有很大的市场效益和很好的应用前景。可以应用于其他人工智能处理器设计中，来提高人工智能处理器的安全性能，保障运行在人工智能处理器中模型的安全。(2) The present invention can effectively reduce the data movement between the artificial intelligence processor and the off-chip DRAM, and can eliminate the boundary between the inner layer and the layer of the fusion layer, so that the attack cannot infer the neural network through the information leakage of the memory side channel. The structure of the network model. The invention can be widely used in the fields of artificial intelligence processor security, AIoT security terminal and the like, and has great market benefits and good application prospects. It can be applied to other artificial intelligence processor designs to improve the security performance of artificial intelligence processors and ensure the security of models running in artificial intelligence processors.

(3)本发明提出的条带融合方法可以应用于有一定大小的片上缓存的人工智能处理器中，在不改变现有人工智能处理器的硬件架构的前提下，增强它的安全性能。(3) The stripe fusion method proposed by the present invention can be applied to an artificial intelligence processor with a certain size of on-chip cache, and enhance its security performance without changing the hardware architecture of the existing artificial intelligence processor.

附图说明Description of drawings

图1为通用人工智能处理器架构示意图；Figure 1 is a schematic diagram of a general artificial intelligence processor architecture;

图中符号说明如下：The symbols in the figure are explained as follows:

SoC:片上系统；PE:处理单元；IFmap:输入特征图；OFmap:输出特征图。SoC: system on chip; PE: processing unit; IFmap: input feature map; OFmap: output feature map.

图2为基于层间融合的人工智能处理器侧信道防御系统；Figure 2 is an artificial intelligence processor side channel defense system based on inter-layer fusion;

图中符号说明如下：SoC:片上系统；PE:处理单元；IFmap:输入特征图；OFmap:输出特征图，Psum：累加和，SNin：权重缓存，NBin：输入特征图缓存，NBout：输出特征图缓存，Pool：池化操作，Relu：非线性激活。The symbols in the figure are explained as follows: SoC: system-on-chip; PE: processing unit; IFmap: input feature map; OFmap: output feature map, Psum: accumulated sum, SNin: weight cache, NBin: input feature map cache, NBout: output feature map Cache, Pool: pooling operation, Relu: nonlinear activation.

图3为AlexNet模型片外DRAM访问周期与片上缓存大小的关系图。Figure 3 shows the relationship between the off-chip DRAM access cycle and the size of the on-chip cache in the AlexNet model.

图4为VGG网络模型片外DRAM访问周期与片上缓存大小的关系图。Figure 4 shows the relationship between the off-chip DRAM access cycle and the size of the on-chip cache in the VGG network model.

图5为两个卷积层的条带融合方法示意图，左边的图为输入特征图，中间的图为神经网络处理器处理的中间数据，右边的图为融合之后的输出特征图。Figure 5 is a schematic diagram of the strip fusion method of two convolutional layers. The figure on the left is the input feature map, the figure in the middle is the intermediate data processed by the neural network processor, and the figure on the right is the output feature map after fusion.

具体实施方式Detailed ways

下面结合附图及实施例对本发明进行详细说明。The present invention will be described in detail below with reference to the accompanying drawings and embodiments.

如图1所示，通用人工智能处理器硬件架构包括：CPU(人工智能处理器的运行环境运行在CPU上)、片外DRAM、人工智能处理器，其中人工智能处理器包括PCIe控制器、内存控制器、处理单元阵列、片上缓存和片上互联总线。其中人工智能处理器还需要软件栈的支持，包括神经网络模型、人工智能处理器的编译器以及CPU(包含人工智能处理器的运行环境)。As shown in Figure 1, the hardware architecture of the general artificial intelligence processor includes: CPU (the operating environment of the artificial intelligence processor runs on the CPU), off-chip DRAM, and artificial intelligence processor, of which the artificial intelligence processor includes PCIe controller, memory Controller, array of processing elements, on-chip cache, and on-chip interconnect bus. The artificial intelligence processor also needs the support of the software stack, including the neural network model, the compiler of the artificial intelligence processor, and the CPU (including the running environment of the artificial intelligence processor).

为了处理神经网络模型的一层，人工智能处理器必须从CPU接收该层的指令，根据指令从片外DRAM读取输入特征图和相应的权重数据，然后搬运到片上缓存，在处理单元阵列中进行乘积和累加操作，完成非线性操作和池化操作后，把处理后的数据写回到片外DRAM中，完成当前层的处理，待完成神经网络模型的所有层处理后，人工智能处理器产生该神经网络模型对应每一类的概率。In order to process one layer of the neural network model, the artificial intelligence processor must receive the instructions of this layer from the CPU, read the input feature map and the corresponding weight data from the off-chip DRAM according to the instructions, and then transfer it to the on-chip cache, in the processing unit array. Perform product and accumulation operations, complete nonlinear operations and pooling operations, and write the processed data back to off-chip DRAM to complete the processing of the current layer. After all layers of the neural network model are processed, the artificial intelligence processor Generate the probability that the neural network model corresponds to each class.

人工智能处理器在执行神经网络推理(inference)的过程中，会产生大量的中间数据，即特征图。但是限于人工智能处理器片上缓存空间太小，不能容纳很大的中间数据，所以在人工智能处理器在处理数据的过程中，会把中间数据往片外DRAM进行搬运，在人工智能处理器与片外DRAM之间反复搬运，会造成内存侧信道信息泄露。攻击者可以观察到人工智能处理器与片外DRAM交互操作的地址和读写类型，其中内存访问可以通过物理探针探测总线进行观察或者插入硬件木马。通过观察到的内存访问模式(read-after-write(RAW)-读写依赖)来推断出神经网络层的结构。In the process of performing neural network inference, the artificial intelligence processor will generate a large amount of intermediate data, that is, feature maps. However, the on-chip cache space of the AI processor is too small to accommodate a large amount of intermediate data. Therefore, in the process of processing data, the AI processor will transfer the intermediate data to the off-chip DRAM. Repeated handling between off-chip DRAMs will cause memory side channel information leakage. Attackers can observe the addresses and read and write types of AI processors interacting with off-chip DRAM, where memory accesses can be observed by physically probing the bus or by inserting hardware Trojans. The structure of neural network layers is inferred from the observed memory access patterns (read-after-write (RAW) - read-write dependency).

为了减少内存侧信道信息泄露，在通用人工智能处理器架构中进行适当的修改，来实现神经网络层间融合处理方法。如图2所示，本发明系统的结构图，该架构是在原通用人工智能处理器架构的基础上适当增加了一定大小的全局缓存单元，以及增加了融合控制单元。In order to reduce the leakage of memory side channel information, appropriate modifications are made in the general artificial intelligence processor architecture to realize the fusion processing method between neural network layers. As shown in FIG. 2 , the structure diagram of the system of the present invention is based on the original general artificial intelligence processor architecture by appropriately adding a global cache unit of a certain size and a fusion control unit.

首先，人工智能处理器接收来自CPU的融合指令；然后，根据这些指令对特定的层进行融合处理操作；最后，完成整个网络的融合处理，输出神经网络模型一类的概率。其中融合控制单元根据人工智能处理器设计的指令集进行设计新的融合指令，根据融合指令进行神经网络的层间融合操作，这样可以做到不修改原人工智能处理器的控制逻辑。如果人工智能处理器设计的指令集不能满足层间融合策略，则需要增加新的融合控制逻辑，来适应新的融合控制指令。First, the artificial intelligence processor receives the fusion instructions from the CPU; then, it performs fusion processing operations on specific layers according to these instructions; finally, the fusion processing of the entire network is completed, and the probability of a neural network model is output. Among them, the fusion control unit designs new fusion instructions according to the instruction set designed by the artificial intelligence processor, and performs the inter-layer fusion operation of the neural network according to the fusion instructions, so that the control logic of the original artificial intelligence processor can not be modified. If the instruction set designed by the artificial intelligence processor cannot satisfy the inter-layer fusion strategy, new fusion control logic needs to be added to adapt to the new fusion control instructions.

通过对各个神经网络模型的分析(包括LeNet，AlexNet，GoogLeNet，VGG，ResNet)，以及各个神经网络模型的层间融合策略需要片上缓存的大小，进行相应的统计。根据融合策略多寡(在一定的片上缓存上面有更多的融合策略，例如图4中在100-150KB的片上缓存上有6种融合策略，则说明其有更好的安全性)与片上缓存大小，得到人工智能处理器所需的全局片上缓存的大小。虽然全局片上缓存越大，神经网络层间融合的策略就越多，人工智能处理器的性能也越高，安全性也越高，但是人工智能处理器上的全局片上缓存不能太大。本发明经过大量反复理论分析和实验统计，得到人工智能处理器所需合适的全局片上缓存的大小。Through the analysis of each neural network model (including LeNet, AlexNet, GoogLeNet, VGG, ResNet), and the size of the on-chip cache required by the layer fusion strategy of each neural network model, the corresponding statistics are performed. According to the number of fusion strategies (there are more fusion strategies on a certain on-chip cache, for example, there are 6 fusion strategies on the 100-150KB on-chip cache in Figure 4, which means it has better security) and the size of the on-chip cache , to get the size of the global on-chip cache required by the AI processor. Although the larger the global on-chip cache is, the more strategies for fusion between neural network layers, the higher the performance of the AI processor, and the higher the security, but the global on-chip cache on the AI processor cannot be too large. The present invention obtains a suitable global on-chip cache size required by the artificial intelligence processor through a large number of repeated theoretical analysis and experimental statistics.

同时还增加了融合控制单元和条带融合方法，其中融合控制单元主要在通用人工智能处理器的控制处理器上添加融合控制逻辑(包括融合执行逻辑和融合指令解析逻辑)，并且定制融合指令；条带融合方法是把输入的特征图的大小按条带进行划分，以划分的条带为单元进行融合处理。At the same time, a fusion control unit and a strip fusion method are added. The fusion control unit mainly adds fusion control logic (including fusion execution logic and fusion instruction parsing logic) to the control processor of the general artificial intelligence processor, and customizes fusion instructions; The strip fusion method is to divide the size of the input feature map into strips, and use the divided strips as a unit for fusion processing.

整个工作流程如下：The whole workflow is as follows:

(1)人工智能处理器根据选择的融合策略产生融合指令；(1) The artificial intelligence processor generates fusion instructions according to the selected fusion strategy;

(2)人工智能处理器接收融合指令，并通过融合控制单元解析出执行的指令，然后根据条带融合方法对神经网络模型进行融合处理。(2) The artificial intelligence processor receives the fusion instruction, and analyzes the executed instruction through the fusion control unit, and then fuses the neural network model according to the strip fusion method.

(3)等人工智能处理器完成当前条带的处理结果之后，把处理的融合结果放入到片外的DRAM中，按步骤(1)和(2)继续处理剩余的条带，直至完成整个特征图的融合处理，最终结果放入到对应的片外DRAM中。(3) After the artificial intelligence processor completes the processing result of the current strip, put the processed fusion result into the off-chip DRAM, and continue to process the remaining strips according to steps (1) and (2) until the entire The fusion processing of the feature map, and the final result is put into the corresponding off-chip DRAM.

(4)根据步骤(1)-(3)，处理神经网络模型要融合的层，最后完成神经网络模型所有的层，输出神经网络模型对应某一类的概率值。(4) According to steps (1)-(3), the layers to be fused by the neural network model are processed, and finally all the layers of the neural network model are completed, and the probability value corresponding to a certain class of the neural network model is output.

融合控制单元主要是用于接收和解析来自CPU的融合控制指令，并根据融合控制指令控制处理单元阵列对神经网络的融合层进行相应的处理，以及分配输入特征图和输出特征图在全局片上缓存的存储位置。The fusion control unit is mainly used to receive and parse the fusion control instructions from the CPU, and control the processing unit array to perform corresponding processing on the fusion layer of the neural network according to the fusion control instructions, and allocate the input feature map and output feature map to the global on-chip cache. storage location.

全局片上缓存是用来缓存人工智能处理器处理的中间数据，包括输入特征图和输入特征图，这样来减少人工智能处理器与片外DRAM之间的数据交互。全局片上缓存的大小是由现存的各种神经网络模型与对应的融合策略决定的，现在通用人工智能处理器的片上缓存大小受限于人工智能处理器的面积和功耗要求，因此，现在的通用神经网上加速器的片上缓存一般较小。图3和图4为AlexNet和VGG网络模型片上缓存大小与片外DRAM访问周期数的关系图，其中不同的融合策略决定了人工智能处理器片上缓存大小和片外DRAM访问周期数，如图中所示在片上缓存大小为100KB-250KB之间时，人工智能处理器可以选择更多的融合策略，也就是说人工智能处理器具有更好的安全性。其中人工智能处理器片上缓存的大小选择可以根据人工智能处理器在一定的片上缓存下，拥有更多的融合策略为前提选择合理的片上缓存大小。The global on-chip cache is used to cache the intermediate data processed by the artificial intelligence processor, including input feature maps and input feature maps, so as to reduce the data interaction between the artificial intelligence processor and off-chip DRAM. The size of the global on-chip cache is determined by various existing neural network models and corresponding fusion strategies. Now the size of the on-chip cache of general artificial intelligence processors is limited by the area and power consumption requirements of artificial intelligence processors. Therefore, the current The on-chip cache of general-purpose neural network accelerators is generally small. Figures 3 and 4 show the relationship between the size of the on-chip cache and the number of off-chip DRAM access cycles in the AlexNet and VGG network models. Different fusion strategies determine the size of the on-chip cache and the number of off-chip DRAM access cycles of the AI processor, as shown in the figure It is shown that when the on-chip cache size is between 100KB and 250KB, the AI processor can choose more fusion strategies, which means that the AI processor has better security. Among them, the size of the on-chip cache of the artificial intelligence processor can be selected according to the premise that the artificial intelligence processor has more fusion strategies under a certain on-chip cache.

如图5所示两层卷积神经网络的条带融合方法示意图，本发明提出条带融合方法来对神经网络的各个层进行融合处理，左边的图为输入特征图，大小是6*6；中间的图为神经网络处理器处理的中间数据，需要存储在全局片上缓存中，大小为4*4；右边的图为融合之后的输出特征图，大小为2*2。其中卷积操作使用的卷积核大小为3*3，从图5中看出输入的特征图被分割为两个条带，其中左边的图的虚线矩形框中与卷积核进行卷积得到中间的图的虚线矩形框的中间数据，然后，中间图的虚线矩形框中的中间数据与3*3的卷积核进行卷积后的结果在右边的图中，在右边图中的第一行数据。下面的条带进行同样的操作得到了右边的图中的第二行数据。The schematic diagram of the strip fusion method of the two-layer convolutional neural network is shown in Figure 5. The present invention proposes a strip fusion method to perform fusion processing on each layer of the neural network. The figure on the left is the input feature map, and the size is 6*6; The middle picture is the intermediate data processed by the neural network processor, which needs to be stored in the global on-chip cache, with a size of 4*4; the picture on the right is the output feature map after fusion, with a size of 2*2. The size of the convolution kernel used in the convolution operation is 3*3. It can be seen from Figure 5 that the input feature map is divided into two strips, and the dashed rectangle in the left image is convolved with the convolution kernel. The intermediate data of the dashed rectangular box in the middle picture, and then, the result of convolution of the intermediate data in the dashed rectangular box of the middle picture with the 3*3 convolution kernel is in the picture on the right, in the first picture on the right row data. Doing the same for the lower strip yields the second row of data in the graph on the right.

条带大小的选择由用户根据人工智能处理器的全局片上缓存大小和融合策略进行选择。但是在相邻条带之间必须要有数据重叠的部分，如图中的中间的有条纹的数据；其中数据重叠部分的行数是由卷积核的大小决定的，具体公式如下所示：The choice of stripe size is selected by the user according to the global on-chip cache size and fusion policy of the AI processor. However, there must be overlapping parts of data between adjacent strips, such as the striped data in the middle of the figure; the number of rows in the overlapping part of the data is determined by the size of the convolution kernel. The specific formula is as follows:

D＝K-1D=K-1

为了量化层间融合方法的安全性，选择在人工智能处理器片上固定的全局缓存大小的前提下，能够产生更多的融合策略的可能性，就认为当前的融合方法是安全的。In order to quantify the security of the inter-layer fusion method, the current fusion method is considered to be safe by choosing the possibility of generating more fusion strategies under the premise of a fixed global cache size on the AI processor chip.

条带融合方法既可以使得用户灵活的选择条带的大小，也可以是用户能够尽可能的利用片上缓存融合更多的层，条带融合方法能够使得全局片上缓存得到充分的利用。The stripe fusion method not only allows the user to flexibly select the size of the stripe, but also allows the user to use the on-chip cache to fuse as many layers as possible. The stripe fusion method enables the global on-chip cache to be fully utilized.

Claims

1. An artificial intelligence processor side channel defense system based on interlayer fusion, comprising: the system comprises a general artificial intelligence processor architecture, a fusion control unit, a global on-chip cache unit and a stripe fusion unit; adding a fusion control unit on the basis of a general artificial intelligence processor architecture, customizing an artificial intelligence processor for a fusion instruction between layers of a neural network model, and combining the fusion control unit with the fusion instruction to realize fusion processing of each layer of the neural network model; adding a global on-chip cache unit on a general artificial intelligence processor architecture for caching intermediate data processed by an artificial intelligence processor, wherein the intermediate data comprises an input characteristic diagram and an output characteristic diagram; the method adopts the strip fusion method to cooperate with the fusion control unit and the fusion instruction to perform fusion processing on each layer of the neural network, thereby reducing the information leakage of the channel at the memory side, confusing the structure of the attacker deducing the neural network model and improving the safety performance of the artificial intelligent processor;

the band fusion unit is realized by adopting a band fusion method and a fusion instruction, wherein the fusion control unit is responsible for realizing the analysis and execution of the fusion instruction; the specific process is as follows: determining a strip superposition part of an input characteristic diagram by a common strip division method and a convolution kernel size to carry out strip division, and then carrying out interlayer fusion processing on a neural network model by combining a customized neural network interlayer fusion instruction according to a corresponding strip according to a strip fusion method so as to enhance the safety of an artificial intelligent processor and improve the performance of the artificial intelligent processor;

the whole work flow is as follows:

(1) the artificial intelligence processor generates a fusion instruction according to the selected fusion strategy;

(2) the artificial intelligence processor receives the fusion instruction, analyzes the executed instruction through the fusion control unit, and then performs fusion processing on the neural network model according to a strip fusion method;

(3) after the artificial intelligent processor finishes the processing result of the current stripe, putting the processed fusion result into an off-chip DRAM, continuously processing the rest stripes according to the steps (1) and (2) until the fusion processing of the whole feature map is finished, and finally putting the final result into the corresponding off-chip DRAM;

(4) and (4) processing the layers to be fused of the neural network model according to the steps (1) to (3), finally completing all the layers of the neural network model, and outputting a probability value corresponding to a certain class of the neural network model.

2. The system of claim 1, wherein: when the stripe fusion unit realizes the stripe fusion method, a part of data overlapping between adjacent stripes is necessary; the number of rows of the data overlapping part is determined by the size of the convolution kernel, and the specific formula is as follows:

D＝K-1

d is the number of rows of data overlap and K is the size of the convolution kernel.