CN119201832B

CN119201832B - A SM4 encryption method based on FPGA

Info

Publication number: CN119201832B
Application number: CN202411064828.3A
Authority: CN
Inventors: 张侠; 安佰春; 马旭; 董科; 程鹏
Original assignee: Qingdao Qingruan Jingzun Microelectronics Technology Co ltd
Current assignee: Qingdao Qingruan Jingzun Microelectronics Technology Co ltd
Priority date: 2024-08-05
Filing date: 2024-08-05
Publication date: 2025-06-17
Anticipated expiration: 2044-08-05
Also published as: CN119201832A

Abstract

The application discloses an SM4 encryption method based on an FPGA, which relates to the technical field of data encryption and comprises the steps of obtaining plaintext data to be encrypted, configuring a plurality of SM4 encryption cores on an FPGA chip to carry out parallel encryption operation on the plaintext data to be encrypted, wherein each SM4 encryption core adopts a pipeline architecture, the pipeline architecture comprises a digital extractor, a round function arithmetic unit and a key adder, an extraction circuit based on physical noise is integrated on the FPGA chip to generate N-bit random numbers as encryption keys of each SM4 encryption core, collecting handshake signals to calculate the operation efficiency of each SM4 encryption core, adopting a priority scheduling algorithm to adjust FPGA resource allocation among the SM4 encryption cores according to the operation efficiency, adopting polynomial and logic operation mixed calculation to carry out power consumption balance in the encryption operation process, and adopting a random waiting period to carry out time sequence randomization. Aiming at the problem of weak differential sensitivity resistance in the prior art, the application improves the capability of resisting side channel attack.

Description

SM4 encryption method based on FPGA

Technical Field

The application relates to the technical field of data encryption, in particular to an SM4 encryption method based on an FPGA.

Background

With rapid development of information technology and wide popularization of network applications, data security problems are increasingly prominent. In various information systems and network platforms, protecting critical data and user privacy from illegal access and malicious attacks has become an important and urgent task. The cryptographic technology is used as a core means for guaranteeing information security, and plays a key role in data encryption, identity authentication and other aspects. The SM4 algorithm encrypts data by adopting a 128-bit key, has higher security strength and good performance, and can effectively resist various conventional password attacks. However, the SM4 algorithm still faces the threat of side channel attacks during implementation.

The side channel attack is an attack mode based on the physical realization information of the password equipment, and the key is presumed or the ciphertext is cracked by analyzing the side channel information such as the power consumption, the electromagnetic radiation, the execution time and the like of the password equipment in the operation process. Common side channel attacks include power consumption analysis attacks, electromagnetic analysis attacks, time analysis attacks, and the like. The attacks utilize the correlation between physical characteristics and keys in the practice of a cryptographic algorithm, and the keys or plaintext information can be restored through statistical analysis and related calculation, so that the security of SM4 encryption is destroyed.

The existing SM4 algorithm hardware implementation scheme mainly adopts devices such as an ASIC (application specific integrated circuit), an FPGA (field programmable gate array) and the like. These schemes, while providing higher encryption performance, have some drawbacks in terms of protection against side channel attacks. Firstly, most of the existing schemes adopt fixed encryption structures and operation modes, so that flexibility and randomness are lacked, and the existing schemes are easy to analyze and predict by attackers. Secondly, the operation time sequence and the energy consumption in the encryption process are regular, a certain correlation exists between the operation time sequence and the secret key, and secret key information can be possibly revealed. In addition, due to the limitation of hardware resources and cost, the prior scheme is difficult to realize complex side channel attack resisting measures, and has limited capability of resisting advanced attacks.

Disclosure of Invention

Aiming at the problem of limited side channel attack resistance in SM4 algorithm encryption in the prior art, the application provides an SM4 encryption method based on an FPGA, which carries out parallel encryption operation and the like on plaintext data through a plurality of SM4 encryption cores of a pipeline architecture, thereby improving the side channel attack resistance.

The aim of the application is achieved by the following technical scheme.

The application provides an SM4 encryption method based on an FPGA, which realizes parallel operation of a plurality of SM4 encryption cores on an FPGA chip, and comprises the steps of inputting plaintext data, namely adopting a high-speed IO interface of the FPGA, such as LVDS, serDes and the like, to realize high-speed input of the plaintext data. The data input buffer is designed for temporarily storing the input plaintext data and balancing the difference between the data input rate and the processing rate of the encryption core. And a data distribution circuit is adopted to distribute the input plaintext data to each SM4 encryption core, so that the parallel processing of the data is realized. And multi-core parallel encryption, namely determining the number of the SM4 encryption cores which can be realized according to the resource conditions of the FPGA chip, such as a lookup table (LUT), a trigger (FF), a Block RAM (BRAM) and the like. And the SM4 encryption core is designed into an independent functional module by adopting a modularized design method, so that the copying and the instantiation are convenient. And a parallel control circuit is adopted to synchronously coordinate the running states of the SM4 encryption cores, so that parallel encryption processing is realized.

Pipeline architecture, namely splitting round functions of SM4 encryption algorithm into a plurality of subfunctions, such as nonlinear transformation, linear transformation and the like, and realizing fine-grained pipeline design. And a multistage pipeline register is adopted, and the pipeline register is inserted among the number fetch device, the round function arithmetic unit and the key adder, so that the parallelism of data processing is improved. And the scheduling and control logic of the pipeline is optimized, the data dependence and conflict among pipeline stages are minimized, and the efficiency of the pipeline is improved. Random key generation, namely integrating physical noise sources such as resistance thermal noise sources, ring oscillators and the like on an FPGA chip to generate random noise signals. And designing a random number extraction circuit, sampling, quantizing and post-processing an analog noise signal generated by a physical noise source, and extracting a random bit sequence. The statistical property and unpredictability of the random numbers are improved by adopting a cryptographically secure random number post-processing algorithm, such as Von Neumann correction, XOR (exclusive-OR) mixing and the like.

And handshake signal acquisition, namely arranging handshake signal lines among stages of a pipeline, wherein the handshake signal lines are used for indicating the validity of data and the processing completion condition. Reliable communication and synchronization between pipeline stages is achieved using an asynchronous handshake protocol, such as a request-reply protocol. And designing a handshake signal acquisition circuit, monitoring the state change of the handshake signals in real time, and recording the time stamp and the duration of the handshake signals. And (3) operational efficiency evaluation, namely designing an efficiency evaluation circuit, and calculating the operational time and the data throughput rate of each stage of the pipeline according to the acquired handshake signal information. And (3) comprehensively considering the operation time and the data throughput rate of each stage of the pipeline by adopting a weighted average algorithm to obtain a comprehensive operation efficiency index. And evaluating and sequencing the operation efficiency of each SM4 encryption core through a threshold comparison and sequencing algorithm. And (3) dynamic resource scheduling, namely designing a resource scheduling controller, and dynamically adjusting the distribution of FPGA resources among the SM4 encryption cores according to the evaluation result of the operation efficiency. And a priority scheduling algorithm, such as priority scheduling based on efficiency, round robin scheduling based on fairness and the like, is adopted to realize dynamic allocation of resources. The number and the positions of SM4 encryption cores are dynamically changed by configuring reconfigurable logic units of the FPGA, so that different task demands and resource constraints are adapted.

Specifically, after completing one round of encryption operation, each SM4 encryption core packages the operation efficiency data and sends the operation efficiency data to the central controller through a special efficiency data channel. The operation efficiency data comprises key information such as the number of the SM4 encryption core, operation time, resource utilization rate and the like. After receiving the operation efficiency data, the central controller analyzes and stores the data for subsequent priority scheduling. And the central controller prioritizes the SM4 encryption cores according to the collected operation efficiency data. The ranking algorithm can adopt a hierarchical ranking mode or a weighted scoring ranking mode based on an efficiency threshold value, and the like, and selects a proper algorithm according to actual requirements. The sequencing result generates a priority list, and the SM4 encryption cores are arranged from high to low according to the operation efficiency. And the central controller generates a resource allocation control instruction according to the priority list. And generating a resource priority guarantee instruction for the SM4 encryption core with highest operation efficiency, and ensuring that the SM4 encryption core obtains optimal resource support. And generating a resource enhancement instruction for the SM4 encryption core with the next highest operation efficiency, and distributing more FPGA resources for the core through a dynamic partial reconstruction technology. And for the SM4 encryption core with lower operation efficiency, generating a resource recycling instruction, and recycling part of FPGA resources by a dynamic part reconstruction technology.

And generating a resource suspension instruction for the SM4 encryption core with extremely low or abnormal operation efficiency, suspending the operation of the core by a gating technology, and releasing the resource. And the central controller packages the generated resource allocation control instruction and distributes the resource allocation control instruction to each SM4 encryption core through a special control instruction channel. After each SM4 encryption core receives the resource allocation control instruction, corresponding resource adjustment operation is executed according to the instruction type. And for the resource priority guarantee instruction, the SM4 encryption core maintains the current resource configuration and continues to operate efficiently. For the resource enhancement instruction, the SM4 encryption core loads a new encryption module in an idle area of the FPGA chip through a dynamic partial reconstruction technology, and the parallelism and throughput are expanded. For a resource recycling instruction, the SM4 encryption core unloads part of the encryption module through a dynamic part reconstruction technology, releases FPGA resources, and reports the released resource information to the central controller. For a resource suspension instruction, the SM4 encryption core suspends a clock signal through a gating technology, enters a low-power consumption state and reports the suspension state to the central controller. And after receiving the resource information released by the SM4 encryption core, the central controller brings the resources into an allocatable resource pool. The central controller reallocates resources according to the priority list and the allocatable resource pool. And the released resources are preferentially distributed to the SM4 encryption core with higher operation efficiency, so that the parallelism and throughput of the SM4 encryption core are improved. The new encryption module is loaded into the SM4 encryption core which is supported by the resource through the dynamic partial reconstruction technology.

The mixed operation mode is to design a polynomial operation circuit to realize the operations of addition, multiplication and the like of the polynomial and introduce nonlinear transformation. A random number generator is adopted, polynomial operation or logic operation is randomly selected in the encryption operation process, and the uncertainty of operation is increased. The complexity of polynomial operation is changed by dynamically adjusting the coefficient and the order of the polynomial, and the difficulty of power consumption analysis attack is increased. Random operation timing, namely designing a random delay circuit, and inserting a random delay unit such as LUT lookup table, flip-flop and the like into a key path of encryption operation. The pseudo-random number generator is used to control the insertion position and duration of the random delay, and introduce temporal randomness. By dynamically adjusting parameters of random delay, such as delay granularity, delay probability and the like, the method adapts to different security requirements and performance requirements.

Further, the SM4 encryption IP core is written by adopting a hardware description language, namely, the RTL (Register Transfer Level) -level description code of the SM4 encryption IP core is written by using a Verilog HDL or VHDL hardware description language according to the specification and flow of an SM4 encryption algorithm. In the RTL level description, parameters such as an input/output interface, a data bit width, an encryption round number and the like of SM4 encryption are defined, and each functional module of SM4 encryption such as round key generation, nonlinear transformation, linear transformation and the like is realized through sequential logic and a combinational logic circuit. And the SM4 encryption IP core is divided into a plurality of sub-modules by adopting a structural design method, so that the readability and maintainability of codes are improved. Through simulation test and verification, the RTL level description of the SM4 encryption IP core is ensured to meet the function and performance requirements of the SM4 encryption algorithm.

Logic synthesis generates a logic gate level netlist by inputting the RTL level description code of the SM4 encrypted IP core into logic synthesis tools such as Synopsys Design Compiler, CADENCE RTL Compiler, etc. In the logic synthesis tool, a technical library and constraint files of a target FPGA device are set, and design indexes such as time sequence constraint, area constraint and the like are specified. And running a logic synthesis tool, performing logic optimization and mapping on the RTL description of the SM4 encrypted IP core, and converting the RTL description into a logic gate level netlist. The logic gate level netlist is composed of basic logic units (such as AND gates, OR gates, NOT gates, flip-flops and the like), and describes the logic functions and connection relations of the SM4 encrypted IP cores. And performing function verification and time sequence analysis on the generated logic gate level netlist, ensuring that the synthesized circuit is consistent with the original RTL level description function, and meeting time sequence constraint.

The layout and routing are mapped to FPGA physical resources by inputting the logic gate level netlist file of the SM4 encrypted IP core into an FPGA layout and routing tool, such as Xilinx Vivado, intel Quartus Prime, etc. In the layout and wiring tool, selecting a target FPGA device model, and setting physical constraint conditions such as pin constraint, time sequence constraint and the like. The layout and routing tool is run to map the logic units of the SM4 encryption IP core onto physical resources of the FPGA device, such as look-up tables (LUTs), flip-flops (FFs), etc. The layout and wiring tool performs layout and wiring on the mapped physical resources through a heuristic algorithm and an optimization strategy, determines the positions and the wiring modes of the physical resources, and generates a bit stream file after layout and wiring. And (3) carrying out time sequence analysis and power consumption analysis on the circuit after layout and wiring, and verifying whether the time sequence performance and the power consumption characteristic of the circuit meet the design requirements.

And (3) balancing and asynchronously resetting the clock tree, namely balancing and optimizing the clock tree on the FPGA chip before carrying out FPGA layout wiring, ensuring that the arrival time of clocks of all physical resources of the SM4 encrypted IP core is consistent, and reducing clock deflection and jitter. A balanced clock tree structure is automatically generated by adopting a Clock Tree Synthesis (CTS) technology, a clock buffer and a delay unit are inserted, and the propagation path and the driving strength of a clock signal are adjusted. And adding an asynchronous reset signal into a top-level module of the SM4 encrypted IP core, and placing the SM4 encrypted IP core into a preset initial state. The asynchronous reset signal is connected with a key register of the SM4 encryption IP core, and when the reset signal is valid, the register is cleared or set to a preset value, so that the SM4 encryption IP core is ensured to start working from a known state.

And the port mapping connection forms an SM4 encryption core, namely the laid-out SM4 encryption IP core is connected with the physical resource of the FPGA chip in a port mapping mode to form a complete SM4 encryption core. In the top layer design file of the FPGA device, a plurality of SM4 encryption IP cores are instantiated and their input/output ports are connected with the physical pins or internal interconnection resources of the FPGA. And connecting control signals such as data input, key input, encryption enabling, reset and the like of the SM4 encryption IP core with corresponding ports of the FPGA through port mapping. Meanwhile, the data output of the SM4 encryption IP core is connected with the output port of the FPGA or the input port of the next stage module to form a complete data encryption path.

And a plurality of SM4 encryption cores are instantiated and connected in parallel to form a parallel SM4 encryption array, so that the overall encryption efficiency and throughput are improved. The configuration on the FPGA chip realizes a plurality of SM4 encryption cores, and covers a complete design flow from high-level language description to physical resource mapping. By utilizing the reconfigurable characteristic and the parallel processing capability of the FPGA, the efficient SM4 encryption hardware acceleration is realized, and the encryption requirements of high throughput and low time delay are met. Meanwhile, through the technical means of clock tree balance, asynchronous reset and the like, the reliability and the stability of an SM4 encryption core are ensured, and a solid hardware foundation is provided for realizing safe and reliable data encryption.

Further, selection and application of logic synthesis tools and place and route synthesis tools, selection of logic synthesis tools at least one logic synthesis tool selected from Synopsys Design Compiler, CADENCE RTL Compiler and Mentor Graphics Precision is used to convert the RTL level description of the SM4 encryption IP core into a logic gate level netlist. Synopsys Design Compiler logic synthesis tools, which are widely used in the industry, provide high quality synthesis results and optimization capabilities. A variety of hardware description languages are supported, such as Verilog, VHDL, and SystemVerilog, etc. Providing abundant constraint setting options, such as time sequence constraint, area constraint, power consumption constraint and the like, and optimizing according to design requirements. The Design Rule Checking (DRC) and layout and wiring pre-estimation (Floorplanning) functions are integrated, which is helpful for early detection of design problems.

CADENCE RTL Compiler A logic synthesis tool from Cadence company, seamlessly integrated with Cadence's digital IC design flow. And generating a high-quality logic gate level netlist by adopting a lead synthesis algorithm and an optimization strategy. Multiple hardware description languages and mixed language designs are supported, and flexible design input modes are provided. And the comprehensive design constraint management and design space exploration functions are provided, so that designers are helped to quickly converge the design. Mentor Graphics Precision A logic synthesis tool of the Mentor Graphics (now SIEMENS EDA) is widely used in FPGA and ASIC designs. And generating a logic gate level netlist with area optimization and performance optimization by adopting a mature synthesis engine and optimization technology. A plurality of hardware description languages are supported, and flexible design input and constraint setting options are provided. The function verification and time sequence analysis functions are integrated, and the correctness and performance of the integrated circuit are ensured. And selecting a layout and wiring synthesis tool, namely selecting at least one layout and wiring synthesis tool from Xilinx Vivado, intel Quartz and Lattice Diamond, wherein the layout and wiring synthesis tool is used for mapping a logic gate-level netlist of the SM4 encryption IP core to a physical resource of an FPGA device and completing layout and wiring.

XilinxVivado A FPGA design kit of Xilinx company supports the design and realization of Xilinx full-series FPGA devices. Integrated synthesis, place and route and verification tools are provided that implement a complete design flow from RTL level description to bitstream generation. Advanced layout and wiring algorithm and optimization strategy, such as clock tree synthesis, congestion sensing layout and the like, are adopted to generate high-quality layout and wiring results. Rich design constraints and optimization options, such as time sequence constraints, pin constraints, power consumption optimization and the like, are provided, and different design requirements are met. Intel Quartz, an FPGA design suite of Intel (front Altera) corporation, supports the design and implementation of INTEL FPGA devices. Providing a complete design flow including integration, layout wiring, timing analysis, power consumption analysis, and the like. And an intelligent layout and wiring engine and an optimization algorithm are adopted to automatically complete the mapping and interconnection of physical resources, and an efficient layout and wiring result is generated. Rich design constraints and optimization options, such as time sequence optimization, logic optimization, power consumption optimization and the like, are provided, and the design requirements of high performance and low power consumption are met. The FPGA design suite of Lattice Diamond Lattice Semiconductor supports the design and implementation of LATTICE FPGA devices. An integrated design environment is provided, including tools for synthesis, place and route, simulation, and debug. And the mapping and wiring of the logic units are automatically completed by adopting an optimized layout and wiring algorithm and strategy, and a compact and efficient layout and wiring result is generated. Flexible design constraint and optimization options such as time sequence constraint, pin constraint, power consumption optimization and the like are provided, and different design requirements are met.

And the application flow of the logic synthesis tool and the layout and wiring synthesis tool is that in the selected logic synthesis tool, RTL level description of the SM4 encryption IP core is taken as input, proper design constraint and optimization options are set, and the synthesis flow is operated to generate a logic gate level netlist. And performing functional verification and time sequence analysis on the generated logic gate-level netlist, and ensuring that the correctness and performance of the synthesized circuit meet the requirements. And importing the synthesized logic gate-level netlist into a selected layout and wiring synthesis tool, and setting target FPGA device types and physical constraints such as pin allocation, clock constraint and the like. In the place and route synthesis tool, a place and route flow is run, a logic gate level netlist is mapped onto physical resources of an FPGA device, such as look-up tables (LUTs), flip-flops (FFs), etc., and place and route of the physical resources is completed. And (3) performing time sequence analysis, power consumption analysis and design rule check on the circuit after layout and wiring, and verifying whether the performance, power consumption and manufacturability of the circuit meet the design requirements. And if the verification result meets the requirement, generating a final FPGA bit stream file for configuring and realizing the FPGA device. By selecting a proper logic synthesis tool and a layout and wiring synthesis tool and following a standard design flow, the implementation of the SM4 encryption IP core on the FPGA can be efficiently completed. The tools provide powerful comprehensive, optimizing and layout wiring functions, help designers map SM4 encryption algorithms onto FPGA hardware platforms quickly and reliably, and realize high-performance and low-power-consumption encryption operations. Meanwhile, through reasonable design constraint and optimization strategies, the parallel processing capability and hardware acceleration advantages of the FPGA device can be fully exerted, and the encryption requirements of high throughput and low time delay are met.

Further, the design of a number fetch unit, a round function arithmetic unit and a key adder in a pipeline architecture, wherein the number fetch unit adopts an architecture supporting variable-length grouping fetch, the input plaintext data to be encrypted is divided into a plurality of data groups with equal lengths according to the grouping length of an SM4 encryption algorithm, the divided data groups are sequentially output to the next stage of the pipeline, the round function arithmetic unit adopts a programmable architecture supporting dynamic configuration of encryption round numbers, the round function arithmetic unit sets the round number encrypted by the SM4 through a configurable register, performs nonlinear transformation and linear transformation on the input data groups according to the configured round number control round function operation times, and outputs a round function operation result to the next stage of the pipeline, and the key adder adopts an architecture supporting key group division and grouping key prediction, divides the input encryption key into a plurality of key groups, predicts the next key group according to the current key group by table lookup, and performs exclusive or operation on the round function operation result and the predicted key group to generate the encryption result of the current round number.

Further, the fetcher comprises a packet storage area, wherein the packet storage area is a storage unit for caching plaintext data packets to be fetched. According to the packet length (128 bits) of the SM4 encryption algorithm, the bit width of the packet memory area is set to 128 bits in order to store one complete plaintext data packet. The depth of the packet memory area is set appropriately according to design requirements and throughput requirements to meet the requirements of data caching and reading. The packet memory area is implemented using Dual Port RAM (DPRAM), one port for writing packets of plaintext data and the other port for reading packets of plaintext data. And the circulation buffer area adopts a linked list structure to manage the plaintext data packet, and realizes the caching and reading of the data through the insertion and deletion operation of the linked list. The linked list structure comprises a plurality of linked list nodes, and each linked list node is used for recording the starting address and the packet length of one plaintext data packet.

The data structure of the linked list nodes is as follows:

struct ListNode { uint32_ T STARTADDR;// start address of plaintext data packet

Uint32_t length;// length of plaintext data packet

ListNode next,// pointer to next linked list node };

A head pointer (head) of the linked list points to a first node in the linked list for identifying a start position of the linked list. The linked list operation interface provides management operations for inserting and deleting linked lists, including insertNode (ListNode x node) to insert a new linked list node into the tail of the linked list. deleteNode (ListNode) delete a specified linked list node from the linked list. The circular buffer is realized by using SRAM or Distributed RAM (DRAM), and the plain data packet is stored and read according to the starting address and the packet length recorded by the linked list node. And the access controller traverses the linked list through the linked list operation interface according to the packet length of the SM4 encryption algorithm and sequentially reads the plaintext data packets recorded in the linked list.

The fetch controller workflow is such that the initialization linked list head pointer (head) is null. When a new plaintext data packet arrives, a new linked list node is inserted into the tail of the linked list through the linked list operation interface insertNode (). Traversing the linked list through a linked list operation interface according to the packet length of the SM4 encryption algorithm, and sequentially reading plaintext data packets recorded by the linked list nodes. For each read plaintext data packet, corresponding data is read from the circular buffer according to the starting address and packet length recorded by the linked list node. After reading a plaintext data packet, deleting the corresponding linked list node through the linked list operation interface deleteNode () to read the data of the variable-length packet. Repeating until the linked list is empty, which indicates that all the plaintext data packets have been read. The fetch controller manages the buffering and reading of the plaintext data packets in the circular buffer by controlling the write pointer and the read pointer of the circular buffer. The write pointer (writePtr) points to the next writable location in the circular buffer. The read pointer (readPtr) points to the next location in the circular buffer to be read. When the write pointer catches up with the read pointer, it is indicated that the circular buffer is full, and it is necessary to wait for the read pointer to advance before continuing to write new data packets. When the read pointer catches up with the write pointer, it indicates that the circular buffer is empty, and it is necessary to wait for the write pointer to write a new data packet before continuing to read.

Further, the round function arithmetic unit comprises a round number configuration register, wherein the round number configuration register is used for storing round number configuration values of the SM4 encryption algorithm and is set according to security and performance requirements. The bit width of the round number configuration register is designed according to the maximum round number of the SM4 encryption algorithm, for example, a maximum of 64 rounds of encryption operation can be supported by using a 6-bit register. The round number configuration register can be configured in a software or hardware mode, and a proper configuration interface is selected according to actual requirements. The round function operation module comprises a plurality of round function operation modules, wherein each round function operation module is used for carrying out one round of encryption operation of the SM4 encryption algorithm. The input of each round function operation module is a data packet (128 bits) and a corresponding round key, and the output is an encrypted data packet (128 bits). The round function operation module internally comprises two sub-modules of nonlinear transformation and linear transformation. The nonlinear transformation submodule performs S-box (Substitution Box) replacement operation on the input data packet to realize nonlinear transformation at byte level. The S-box may be implemented using a look-up table (LUT) or combinatorial logic circuit. The linear transformation submodule carries out linear transformation on the result after S box replacement, and the linear transformation submodule comprises cyclic left shift and exclusive OR operation. The linear transformation may be implemented using shift registers and exclusive-or gates. And performing exclusive OR operation on the output of the round function operation module and the corresponding round key to obtain the encryption result of the current round.

The input end of the round function selector is connected with the output of each round function operation module, and the output end is connected with the next stage of the assembly line. The round function selector is used for gating the output of the round function operation module corresponding to the round number to the next stage of the pipeline according to the value of the round number configuration register. The round function Selector may be implemented using a Multiplexer (MUX) or a Data Selector (Data Selector) to control the selection signal according to the round number configuration value. And the round key generation module generates round keys of each round through a key expansion algorithm of the SM4 encryption algorithm according to the input initial key. The key expansion algorithm includes the step of dividing the initial key into 4 sub-keys (32 bits per sub-key). And performing circular left shift and S box replacement operation on each sub-key to generate an expanded sub-key. And performing exclusive OR operation on the expanded subkey and the fixed parameter to obtain a round key. The round keys for all rounds are repeatedly generated. The round key generation module provides the generated round key to the corresponding round function operation module for encryption operation. The round key generation module may implement a key expansion algorithm using a look-up table (LUT) or combinatorial logic circuit to generate a corresponding round key from the initial key and the round number.

And the round function controller controls the transfer of data packets among all levels of pipeline modules according to the flow of the SM4 encryption algorithm, controls the round function selector to switch round functions, and controls the round key generation module to provide round keys for carrying out multi-round encryption operation of the SM4 encryption algorithm. The workflow of the round function controller is such that a data packet is received from a previous stage of the pipeline and passed to the first round function operation module. And according to the value of the round number configuration register, controlling the round function selector to gate the output of the round function operation module corresponding to the round number to the next stage of the pipeline. The round key generation module is controlled to generate round keys with corresponding rounds and provide the round keys to the corresponding round function operation module. And waiting for the encryption operation of the current round to be completed and transmitting the encryption result to the next stage of the pipeline. Repeating until all rounds of encryption operations are completed. The round function controller may be implemented using a Finite State Machine (FSM), and controls the transfer of data packets, round function selection, and round key generation according to the round number configuration and the flow of the SM4 encryption algorithm.

Further, the key adder comprises a key packet caching module, wherein the key packet caching module is used for caching a key used by the SM4 encryption algorithm and dividing an input initial key into 4 sub-keys for caching. The input of the key packet buffer module is a 128-bit initial key, and the output is 4 sub-keys with 32 bits. The key grouping cache module is implemented by using a register or a Distributed RAM (DRAM), and is divided into 4 subkeys according to the position relationship of the initial key and stored in the corresponding register or DRAM. The storage sequence of the subkeys can be adjusted according to the requirements of the SM4 encryption algorithm so as to facilitate the subsequent key expansion operation.

And the key expansion module performs key expansion on the subkeys output by the key packet buffer module in a table look-up mode, and performs random replacement on data bits of the expanded subkeys to generate a round key for the current encryption round. The input of the key expansion module is 4 sub keys with 32 bits, and the output is a round key with 32 bits. The key expansion module uses a lookup table (LUT) to realize key expansion operation of the SM4 encryption algorithm, and obtains a corresponding expansion result according to the lookup table of the values of the subkeys. The lookup table may be implemented using ROM or combinatorial logic circuitry to generate corresponding lookup table contents according to the key expansion rules of the SM4 encryption algorithm. And carrying out random replacement on the data bits of the extended subkey, and increasing the randomness and the security of the key. The permutation operation of the data bits may be implemented using a pseudo-random number generator (PRNG) or a fixed permutation pattern. The replaced expansion subkey is used as a round key of the current encryption round and is used for carrying out exclusive OR operation with the output result of the round function arithmetic unit.

And the key exclusive-or module performs bitwise exclusive-or operation on the intermediate result output by the round function operator and the round key generated by the key expansion module to obtain an output result of the current encryption round. The input of the key exclusive-or module is a 32-bit intermediate result output by the round function arithmetic unit and a 32-bit round key generated by the key expansion module, and the output is a 32-bit exclusive-or result. The key exclusive-or module uses an exclusive-or circuit to realize bitwise exclusive-or operation, and performs exclusive-or operation on the intermediate result and each bit of the round key to obtain an exclusive-or result. And the exclusive-or result is used as an output result of the current encryption round and is transmitted to the next round through the pipeline to carry out subsequent encryption operation.

And the key controller sequentially triggers the key expansion module to generate round keys in each encryption round according to round number configuration of the SM4 encryption algorithm, controls the key exclusive-or module to execute round key exclusive-or, and transmits an exclusive-or result to the next round through a pipeline. The key controller is realized by a Finite State Machine (FSM), and judges whether the key expansion and the key exclusive or are finished according to the current round and the SM4 encryption round number. The key controller workflow is such that the initialization round counter is 0, indicating that it is currently in the first encryption round. Triggering a key expansion module to generate a corresponding round key according to the current sub-key. And the control key exclusive OR module is used for carrying out exclusive OR operation on the intermediate result output by the round function arithmetic unit and the round key. The exclusive or result is passed through the pipeline to the next round. And judging whether the current round reaches the SM4 encryption round configuration value, if not, increasing a round counter, and returning to continue to execute the next round. If the SM4 encryption round number configuration value is reached, the encryption process is completed, and a final encryption result is output. The key controller controls the execution times of key expansion and key exclusive or by judging the relation between the round counter and the SM4 encryption round number configuration value, so as to ensure that the encryption operation of the designated round number is completed.

Furthermore, by adopting a three-way handshake verification mechanism, through establishing an independent handshake channel among the number acquirer, the round function arithmetic unit and the key adder, the identity verification and communication synchronization among the internal modules of the SM4 encryption core are realized by utilizing the handshake password generated by the pseudo-random number generator and the output of the physical non-deterministic source PHYS, and the safety and reliability of an SM4 algorithm are improved. And collecting handshake signals, namely setting a pseudo-random number generator on an FPGA chip and generating pseudo-random numbers at different moments as handshake passwords. The pseudo-random number generator can be realized by adopting a Linear Feedback Shift Register (LFSR) and other hardware circuits, so that the randomness and unpredictability of the handshake password are ensured.

An independent handshake channel is arranged among the digital acquirer, the round function arithmetic unit and the key adder, and the channel adopts a high-speed serial transceiver (such as LVDS, serDes and the like) of an FPGA, so that high-speed and low-delay handshake signal transmission is realized, and the channel is independent of a normal data channel and does not influence the transmission efficiency of SM4 encrypted data. The physical non-deterministic source PHYS is arranged at the transmitting end and the receiving end of the independent channel, the PHYS can adopt random signal sources such as environmental noise, circuit noise and the like, and the random characteristics of the physical noise are extracted as additional factors for handshake password verification after ADC (analog-to-digital converter) sampling, filtering, quantization and the like are carried out. The PHYS output is combined with the pseudo random number handshake password, a handshake password challenge response verification mechanism based on physical uncertainty is constructed, and the security of handshake verification is further enhanced.

And the handshake process comprises the steps that after the number acquirer finishes plaintext grouping number acquisition, the handshake password 1 at the current moment is acquired from the pseudo-random number generator, the password 1, the number acquisition completion identification code and the time stamp are packaged, a first handshake signal is generated, and the first handshake signal is sent to the round function operator through a handshake interface on an independent channel. After the round function arithmetic unit receives the first handshake signal, the password 1 is verified, and the verification method is to compare the received password 1 with the output of the local PHYS of the round function arithmetic unit and judge the validity of the password 1. After verification is passed, the round function arithmetic unit obtains a handshake password 2 at the next moment from the pseudo-random number generator, packages the password 2 with the round function arithmetic completion identification code, the time stamp and the first handshake signal to generate a second handshake signal, and sends the second handshake signal to the key adder through the handshake interface.

After the key adder receives the second handshake signal, the password 2 is verified, and the verification method is similar to that of a round function arithmetic unit. After verification is passed, the key adder acquires a handshake password 3 at the next moment from the pseudo-random number generator, packages the password 3 with an encryption completion identification code, a time stamp, a first handshake signal and a second handshake signal to generate a third handshake signal, and returns the third handshake signal to the counter through a handshake interface. After the fetcher receives the third handshake signal, the password 3 in the fetcher is verified, and the verification method is similar to that of the former two times. After the verification is passed, the fetcher completes a complete handshake process, the identity of each module of the SM4 encryption core is verified, the communication is also completed synchronously, and the next round of encryption process can be entered. By using the pseudo-random number handshake password and the physical nondeterministic source, an identity authentication and communication synchronization mechanism between internal modules of the SM4 encryption core is established in a three-way handshake authentication mode. The randomness and unpredictability of the handshake password and the introduction of a physical nondeterministic source enhance the security of handshake verification and prevent the threats such as man-in-the-middle attack, replay attack and the like. The independent handshake channels ensure efficient transmission of handshake signals, and the processing efficiency of SM4 encrypted data is not affected. The introduction of the time stamp prevents the delay and replay of the handshake signals.

Furthermore, by using the timestamp information carried in the handshake signals and calculating the operation time of each module in the SM4 encryption core, the overall operation efficiency of the SM4 encryption core is obtained. By introducing the weight coefficient, the influence degree of different modules on the encryption efficiency is considered, so that the efficiency evaluation is more accurate and comprehensive. Analyzing the handshake signals, namely analyzing the acquired first handshake signals, second handshake signals and third handshake signals and extracting the timestamp information carried in the first handshake signals, the second handshake signals and the third handshake signals. The time stamp information may be in a high precision time stamp format, such as a nanosecond representation of a UNIX time stamp, to ensure accuracy of the time measurement.

And respectively calculating the operation time of the extractor, the round function operator and the key adder according to the timestamp difference value of the adjacent handshake signals. The method specifically comprises the following steps that the fetch controller calculates operation time T1=T3_end-T1_start of the fetch device according to a start time stamp T1_start of the first handshake signal and an end time stamp T3_end of the third handshake signal. T1 here includes the total time for the fetcher to complete the plaintext packet fetch, generate the first handshake signal, and wait for the third handshake signal to return. The round function controller calculates an operation time t2=t2_end-t1_end of the round function operator based on the end timestamp t1_end of the first handshake signal and the end timestamp t2_end of the second handshake signal. Here T2 includes the time at which the round function operator receives the first handshake signal, completes the round function operation, and generates the second handshake signal. The key controller calculates an operation time t3=t3_start-t2_end of the key adder based on the end time stamp t2_end of the second handshake signal and the start time stamp t3_start of the third handshake signal. Here T3 includes the time the key adder receives the second handshake signal, performs the key addition operation, and generates the third handshake signal.

Calculation of the calculation efficiency E of the SM4 encryption core is calculated by the formula e=1/(w1×t1+w2×t2+w3×t3) based on the calculation times T1, T2, T3 of the extractor, round function operator and key adder and the weight coefficients W1, W2, W3 of their influence on the encryption efficiency. The values of the weight coefficients W1, W2 and W3 can be estimated and adjusted according to the complexity, the operand and other factors of each module in the SM4 encryption core so as to reflect the influence degree of the weight coefficients on the encryption efficiency. For example, round function operators typically have a high complexity and operand, with a large value for the weight coefficient W2, while the operations of the decimator and key adder are relatively simple, with small values for the weight coefficients W1 and W3. The calculation efficiency E is calculated through a formula, the calculation time and the influence weight of each module in the SM4 encryption core are comprehensively considered, and a comprehensive efficiency evaluation result is obtained. The larger the E value, the higher the operation efficiency of the SM4 encryption core, and the faster the encryption speed.

Furthermore, random polynomial operation and logic operation are introduced into each module of the SM4 encryption core, and the operation processes of all parts are coordinated through the mixed operation control module, so that when encrypted data flows in the SM4, the encrypted data is subjected to random encoding, confusion and other processes, the data flow is complex and changeable, and is difficult to capture and analyze by an attacker, thereby improving the security of the SM4 algorithm. When the fetch controller reads plaintext data, the polynomial coefficient generation module generates random values of variables x, y and z in a polynomial f (x, y, z) = (ax+b) y+cz by adopting an LFSR, and the Bracewell sequence B (n) generates polynomial coefficients a, B and c, wherein random seeds of the LFSR and a starting value n of B (n) are provided by a seed generator, so that the randomness of the polynomial coefficients is ensured. The plaintext data is used as the input variable of a polynomial f (x, y, z), the plaintext is randomly encoded through polynomial operation, and the encoding result is used as the output of a fetch controller to be transmitted to a round function arithmetic unit, so that the statistical characteristics of the plaintext are covered. When the round function arithmetic unit carries out nonlinear transformation, a polynomial arithmetic module expands the operation process of f (x, y, z) through a Horner algorithm, and the specific steps are that an initialization result register res is 0, coefficients c, B and a of polynomials z, y and x are sequentially read from B (n), an x term tmp=ax+R is calculated, wherein R is a random number generated by an LFSR, a y term tmp= tmpy +b is calculated, and a z term res= tmpz +S is calculated, wherein S is another random number generated by the LFSR. And taking the polynomial operation result as an intermediate result of S box replacement, and performing exclusive OR, AND, OR and other logic operation confusion with the S box output through a logic operation circuit to obtain a confused nonlinear transformation result. During linear transformation, logical operations such as exclusive OR and the like are performed on the linear transformation result and the polynomial operation result to introduce randomizing factors, and the polynomial operation and the logical operation are alternately performed, so that the complexity of round function operation is increased.

In the process of generating round keys and executing round key addition operation by the key adder, random polynomial operation and logic operation are introduced, random encoding and confusion are carried out on round key and round function results, the regularity of the key addition operation is disturbed, the randomness of the key addition result is increased, and therefore the safety of an SM4 algorithm is improved. The generation process of the round key comprises the steps that a polynomial coefficient generation module generates random values of variable k in polynomial f (k) =ak 2+bk+c by using an LFSR, and a Bracewell sequence B (n) generates polynomial coefficients a, B and c, wherein random seeds of the LFSR and a starting value n of B (n) are provided by a seed generator, and randomness of the polynomial coefficients is guaranteed. Each subkey in the key group is used as an input variable of a polynomial f (k), and the subkeys are randomly encoded through a polynomial operation module. And generating a round key by the randomly coded sub-key through a key expansion algorithm, and carrying out exclusive or mixing on the expanded round key and a random coding result to obtain a randomized round key. By introducing random polynomial operation in the round key generation process, the randomness of the round key is increased, so that an attacker can hardly infer the key by analyzing the statistical characteristics of the key.

And in the round key addition operation process, the round function operation result and the randomized round key are subjected to exclusive OR, AND, OR and other logic operations through a logic operation circuit, and nonlinear confusion is introduced. And performing exclusive-or mixing on the logic operation result and the random polynomial operation result to obtain a mixed key addition result. The random polynomial operation result is generated by a polynomial operation module, a similar method to the round key generation process is adopted, LFSR and a Bracewell sequence are used for generating random polynomial coefficients, and round function results are randomly encoded. The key controller controls the time sequence of the mixed operation, so that random polynomial operation, logic operation and round key addition operation are alternately executed, the regularity of the key addition operation is disturbed, the uncertainty of a key addition result is increased, and an attacker can not easily infer the key by analyzing the time sequence rule of the key addition operation.

And (3) iterating the round of encryption, namely performing exclusive-or mixing on the round of encryption result and the random polynomial operation result after one round of SM4 encryption is completed, and taking the round of encryption result as the input of the next round of encryption. By introducing random polynomial operation between the rounds, the encryption result of the previous round is randomly encoded, so that the input of each round has randomness, the diffusivity of the SM4 algorithm is increased, and the avalanche effect is more obvious. After the last round of encryption is completed, the final result is encoded again through random polynomial operation, and the confused SM4 ciphertext output is obtained. The randomness of the ciphertext is increased, so that the statistical characteristics of the ciphertext are more uniform and are difficult to be utilized by an attacker. In the polynomial operation process, random numbers R and S generated by an LFSR are inserted to participate in calculation of intermediate results tmp and res, a randomizing factor is introduced to dynamically mask the polynomial operation result, and the power consumption characteristics of the polynomial operation are adjusted by irregularly changing the clock period of the polynomial operation, so that the difficulty of power consumption analysis is increased.

After the polynomial operation is finished, inputting the results tmp and res into a logic operation circuit, further confusing the operation result through exclusive OR, AND, OR and the like, feeding the logic operation result back to the input of the polynomial operation, repeating the processes of the polynomial operation and the logic operation, forming a ring-shaped data path, and increasing the data complexity through repeated iterative mixed operation. The control signals of polynomial operation and logic operation are provided, and the control signals are cooperated with the fetch controller, the round function controller and the key controller to ensure that the mixed operation and the SM4 encryption flow are synchronously carried out, so that when the encrypted data is transmitted in the SM4, the encrypted data is subjected to random polynomial mixed operation processing, and the data flow is complex and changeable and is difficult to capture by an attacker.

Compared with the prior art, the application has the advantages that:

By configuring a plurality of SM4 encryption cores adopting a pipeline architecture on an FPGA chip, the efficiency and throughput of SM4 encryption are improved by utilizing multi-core parallel processing. The pipeline architecture divides encryption operation into three stages of fetch, round function operation and key addition, thereby realizing fine-grained parallelism of encryption processing, fully playing the advantage of parallel computation of the FPGA chip and improving encryption speed.

And collecting handshake signals among all stages in the SM4 encryption core, accurately reflecting the operation states of all stages through the handshake signals, and calculating the real-time operation efficiency of the SM4 encryption core. And the FPGA resource allocation among SM4 encryption cores is dynamically adjusted by adopting a priority scheduling algorithm according to the operation efficiency, so that the load balance and the resource optimization in the encryption processing process are realized, and the resource utilization rate of an FPGA chip is improved.

And generating a high-quality random number serving as an encryption key of the SM4 encryption core by using a random number extraction circuit based on physical noise and integrated on the FPGA chip. The physical noise source is introduced to enhance the randomness of the key, improve the unpredictability of the key and effectively resist attacks such as key guessing, key analysis and the like.

The method has the advantages that a calculation mode of mixing polynomial operation and logic operation is adopted in the SM4 encryption core, coding and confusion of plaintext data and intermediate results are achieved through nonlinear polynomial operation, statistical characteristics of the data are covered, and difficulty in power consumption analysis attack is increased. The polynomial operation and the logic operation are mixed, so that the operation complexity is further improved, and the side channel attack resistance of the SM4 encryption algorithm is enhanced.

A waiting period is randomly inserted in the operation process of the SM4 encryption core, time jitter is introduced through a randomization operation time sequence, the relevance between encryption operation and energy consumption is disturbed, the difficulty of side channel attack such as power consumption analysis and electromagnetic analysis is increased, and the strength of side channel attack prevention of SM4 encryption is improved.

Drawings

The application will be further described by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. The embodiments are not limiting, in which like numerals represent like structures, wherein:

Figure 1 is an exemplary flow chart of an FPGA-based SM4 encryption method according to some embodiments of the present application;

FIG. 2 is an exemplary flow chart for configuring an encryption core according to some embodiments of the application;

Fig. 3 is an exemplary flow chart of acquisition handshake signals according to some embodiments of the application.

Detailed Description

The method and system provided by the embodiment of the application are described in detail below with reference to the accompanying drawings.

FIG. 1 is an exemplary flowchart of an FPGA-based SM4 encryption method according to some embodiments of the present application, including obtaining plaintext data to be encrypted, inputting the plaintext data into an FPGA chip, configuring a plurality of SM4 encryption cores on the FPGA chip, performing parallel encryption operations on the plaintext data by using the SM4 encryption cores to obtain ciphertext data, wherein each SM4 encryption core adopts a pipeline architecture, the pipeline architecture includes an access number, a round function arithmetic unit and a key adder, respectively executing three stages of access number, round function arithmetic and key addition of encryption operations, generating N-bit random numbers by using a random number extraction circuit based on physical noise integrated on the FPGA chip, using the generated N-bit random numbers as encryption keys of each SM4 encryption core, collecting handshake signals between the access number unit, the round function arithmetic unit and the key adder inside each SM4 encryption core, using the handshake signals to reflect operation states of the access number unit, the round function arithmetic unit and the key adder, calculating operation efficiency of each SM4 encryption core according to the collected handshake signals, respectively executing operation efficiency of the encryption cores, performing a random power consumption adjustment in a random channel allocation and a dynamic computation performance in a random channel computation algorithm on the side of the SM4 encryption operation, and improving the performance of the encryption algorithm in a waiting for a random channel computation performance of the encryption operation.

And acquiring plaintext data to be encrypted, and inputting the plaintext data into an FPGA chip, wherein the plaintext data to be encrypted is received from external equipment (such as an upper computer, a sensor and the like) through an external interface of the FPGA chip, such as GPIO, UART, SPI, I C and the like. And transmitting the received plaintext data to a data buffer module, such as FIFO, RAM and the like, in the FPGA chip through a data bus. The bit width and depth of the data buffer module are designed according to the packet length and throughput requirements of the SM4 encryption algorithm. The data buffer module is connected with a data input interface of the SM4 encryption core, and synchronous transmission of data is realized through handshake signals. When the plaintext data amount stored by the data caching module reaches the input data length of the SM4 encryption core, the extractor of the SM4 encryption core starts to read plaintext data from the data caching module.

Figure 2 is an exemplary flow chart of configuring cryptographic cores according to some embodiments of the application, configuring multiple SM4 cryptographic cores on an FPGA chip, writing the RTL level code of the SM4 cryptographic IP core according to the specifications of the SM4 cryptographic algorithm using Verilog HDL or VHDL hardware description language. The code comprises three modules of a fetcher, a round function arithmetic unit and a key adder, and is connected in a pipeline mode. And logically synthesizing RTL level codes of the SM4 encrypted IP core by using a logic synthesis tool such as Synopsys Design Compiler, CADENCE RTL Compiler or Mentor Graphics Precision and the like to generate a gate level netlist file of the SM4 encrypted IP core. In the comprehensive process, the time sequence, the area and the power consumption of the SM4 encryption IP core are constrained and optimized according to the characteristics of the FPGA device. And using layout and wiring tools such as Xilinx Vivado, intel alarm or Lattice Diamond to perform layout and wiring on the gate-level netlist of the SM4 encryption IP core, mapping logic units in the SM4 encryption IP core onto physical resources (such as LUTs, FFs and the like) of the FPGA device, and completing connection and wiring between the physical resources. In the process of layout and wiring, clock path delay of each physical resource of the SM4 encryption IP core is balanced through clock constraint and clock tree synthesis, and clock synchronization is ensured.

In the top-level module of the SM4 encryption IP core, a register and a storage unit in the SM4 encryption IP core are initialized to a preset state through an asynchronous reset signal. And integrating the SM4 encryption IP core into the top-level design of the FPGA chip through port mapping or encapsulation into an AXI4 standard interface. Multiple SM4 encryption IP cores are instantiated in the top-level design according to the throughput requirements of SM4 encryption, constituting a parallel SM4 encryption core array. And (3) connecting the SM4 encryption core array with other functional modules (such as key management, data caching, external interfaces and the like) by using on-chip interconnection resources of the FPGA chip, such as an AXI4 internal bus, so as to construct a complete SM4 encryption system. And finally, integrating the design of the top layer of the FPGA, laying out and wiring, generating an FPGA bit stream file, and downloading the FPGA bit stream file into an FPGA chip to complete the configuration and deployment of the SM4 encryption system.

The random number extraction circuit based on physical noise is integrated on the FPGA chip, N-bit random numbers are generated, the random numbers are used as encryption keys of each SM4 encryption core and are used for SM4 encryption operation, and the random number extraction circuit based on physical noise, such as a Ring Oscillator (Ring Oscillator) or a chaotic mapping circuit (Chaotic Mapping Circuit) and the like, are integrated on the FPGA chip. The physical noise source may be from thermal noise, shot noise, 1/f noise, or the like inside the FPGA chip. The random number extraction circuit adopts a digital post-processing mode to amplify, filter and quantize an analog noise signal generated by a physical noise source, so as to obtain a random bit sequence which is uniformly distributed. And adopting NIST SP800-22 and other standard test kits to detect randomness of the random bit sequence and ensure the quality of random numbers. And according to the key length requirement of the SM4 encryption algorithm, grouping and parallelizing the random bit sequence to obtain an N-bit (e.g. 128-bit) random number. The random bit sequence may be buffered and read using a shift register or FIFO like structure. And distributing the generated N-bit random number to each SM4 encryption core on the FPGA chip through a bus or a special interface as an encryption key thereof. The SM4 encryption core performs encryption operation on the plaintext data according to the received random key. In order to improve the security of the random number, a dynamic updating mode can be adopted to periodically (such as every certain number of encryption operations) re-generate a new random key from the random number extraction circuit, and update the encryption key of the SM4 encryption core.

Inside each SM4 encryption core, the fetcher comprises a packet memory area, wherein the packet memory area is realized by using Block RAM (BRAM) resources on an FPGA chip. The BRAM is a built-in high-speed dual-port memory of the FPGA, and has independent read-write ports and larger storage capacity. The bit width and depth of BRAM are designed according to the packet length (128 bits) of SM4 encryption algorithm and the throughput requirements of the encryption core. For example, a BRAM with a bit width of 128 may be used, with a depth of n (n being the number of packets of plaintext data that need to be buffered). And writing the plaintext data packets to be encrypted into the BRAM in sequence, wherein each packet occupies a memory cell of the BRAM. The writing port of the packet storage area is connected with the data caching module and used for receiving the plaintext data packet from the data caching module, and the reading port is connected with the fetch controller and used for reading the plaintext data packet to be encrypted.

Circular buffer-linked list nodes of the circular buffer are implemented by using on-chip memory resources of the FPGA (e.g., distributedRAM or SHIFT REGISTER LUT). Each linked list node contains two fields, a start address (the start address pointing to the corresponding plaintext data packet in the packet storage area) and a packet length (the length of recording the plaintext data packet, fixed at 128 bits). The link table head pointer is implemented using a register (e.g., flip-Flop) that has the same bit width as the memory location address bit width of the linked list node. Initially, the head pointer of the linked list points to the first linked list node. The linked list operation interface includes both insert and delete operations, implemented using a Finite State Machine (FSM). Inserting operation, namely writing new plaintext data packets into the packet storage area by the data caching module, and triggering the inserting operation of the linked list operation interface. The insert operation takes a free node from the free linked list node pool, writes the start address and packet length of the new plaintext data packet to the node, and inserts the node into the tail of the linked list. At the same time, the linked list head pointer is updated to point to the newly inserted node. And deleting operation, namely triggering the deleting operation of the linked list operation interface after the fetch controller finishes reading one plaintext data packet. The delete operation removes the currently read linked list node from the linked list and releases the node back to the free node pool. And simultaneously, updating a linked list head pointer to point to the next linked list node to be read.

Fetch controller-control logic for implementing the fetch controller using a Finite State Machine (FSM). FSMs contain idle, read, wait, etc. states. And the access controller acquires the linked list node to be read currently through the linked list head pointer. The start address and packet length of the plaintext data packet are extracted from the node. The fetch controller reads the corresponding plaintext data packet from the BRAM of the packet memory area using the start address as a read address. The read data packet is sent to a round function operator of the SM4 encryption core for encryption processing. The fetch controller simultaneously controls the read pointer of the circular buffer. The read pointer points to the linked list node that is currently being read. Each time a data packet is read, the read pointer is incremented, pointing to the next linked list node to be read. When the read pointer reaches the tail of the linked list, the fetch controller determines whether the circular buffer is empty. If not, the read pointer returns to the head of the linked list to continue reading the data packet, and if not, the fetch controller enters a wait state to wait for writing a new plaintext data packet. The fetch controller also controls the write pointer of the circular buffer. The write pointer points to the location where the new node is currently writable. When the data caching module writes a new plaintext data packet, the fetch controller triggers the insertion operation of the linked list operation interface, inserts a new node into the position pointed by the write pointer, and updates the write pointer. The fetch controller determines the status of the circular buffer (empty, full, non-empty, non-full) by the positional relationship of the read pointer and the write pointer. When the circulation buffer area is empty, the fetch controller sends a data request signal to the data buffer module to request a new plaintext data packet, and when the circulation buffer area is full, the fetch controller pauses sending the data request, waits for the round function arithmetic unit to read the data packet, and releases the space of the buffer area.

The working flow of the sampler is that the linked list is empty in the initial stage, the circulation buffer zone is empty, and the read pointer and the write pointer point to the head of the linked list. When the data caching module writes a new plaintext data packet, the access controller triggers the insertion operation of the linked list operation interface, inserts a new node into the tail part of the linked list, and updates the write pointer. The fetch controller reads the first node in the linked list through the linked list head pointer, and acquires the start address and the packet length of the plaintext data packet to be read. And the fetch controller reads the corresponding plaintext data packet from the BRAM of the packet storage area according to the initial address and sends the plaintext data packet to the round function arithmetic unit for encryption processing. The fetch controller triggers the deletion operation of the linked list operation interface, removes the read node from the linked list, and updates the read pointer. The repetition is continued until the linked list is empty, i.e. all buffered packets of plaintext data are read and encrypted. When the linked list is empty, the fetch controller enters a wait state to wait for a new plaintext data packet to be written. The data buffering module repeats as new packets are written. The round function operator comprises a round number configuration register which is realized by using on-chip register resources (such as Flip-Flop) of the FPGA. The bit width of the round configuration register depends on the round of the SM4 encryption algorithm, which is 32 for the standard SM4 encryption algorithm, and is therefore 6 bits (2ζ6=64 > 32). The value of the round number configuration register is written in by the control module when the SM4 encryption core is initialized, and the round number configuration of the current encryption operation is stored. The output of the round number configuration register is connected to the round function selector and is used for controlling the output of the round function operation module for selecting the corresponding round number.

And each round function operation module consists of two parts, namely nonlinear transformation and linear transformation according to the specification of the SM4 encryption algorithm. The nonlinear transformation uses a look-up table (LUT) to implement the S-box replacement operation. The S-box is a non-linear substitution table with 8-bit inputs and 8-bit outputs. The S-box look-up table is implemented using Distributed memory resources of the FPGA (e.g., distributed RAM or SHIFT REGISTER LUT). The linear transformation is implemented using exclusive-or (XOR) operations and cyclic shift operations. The linear transformation circuit is implemented using the logic resources of the FPGA (e.g., LUT) and shift registers. The input of each round function operation module is a 128-bit data packet and a 128-bit round key, and the output is a 128-bit encryption result. The round function operation modules are connected in a pipeline mode, and each module corresponds to one round of encryption operation of the SM4 encryption algorithm. The depth of the pipeline is designed according to the round number and the target clock frequency of the SM4 encryption algorithm.

Round function selector-the round function selector is implemented using a Multiplexer (multiplexor) of the FPGA. The input end of the round function selector is connected with the output of each round function operation module, and the input data width is 128 bits multiplied by the number of the round function operation modules. The selection signal of the round function selector comes from the round number configuration register, and the bit width of the selection signal is log2 (the number of round function operation modules). The output end of the round function selector is connected with the next stage module of the pipeline, and the output data width is 128 bits. According to the value of the round number configuration register, the round function selector gates the output of the round function operation module corresponding to the round number to the next stage of the pipeline. And the round key generation module is realized by using the logic resources of the FPGA according to the key expansion algorithm of the SM4 encryption algorithm. The round key generation module inputs the 128-bit initial key and outputs 32 128-bit round keys. The round key generation module generates round keys of each round according to the initial key through exclusive OR operation, S box replacement and cyclic shift operation. The generated round key is provided to the corresponding round function operation module through a special data path. Each round function operation module uses round keys corresponding to the round numbers to carry out encryption operation.

Wheel function controller-control logic for implementing the wheel function controller using a Finite State Machine (FSM). The round function controller controls the transfer of data packets between the stages of the pipeline according to the flow of the SM4 encryption algorithm. Data synchronization between modules is achieved by handshaking signals, such as a data valid signal and a data ready signal. The round function controller generates a selection signal of the round function selector, and according to the current encryption round number, the output of the corresponding round function operation module is gated to the next stage of the pipeline. The round function controller controls the operation of the round key generation module, triggers round key generation when encryption starts, and provides the generated round key to the corresponding round function operation module. The round function controller is also responsible for controlling the progress of the data packet in the pipeline, recording the current encryption round number through the counter, comparing the current encryption round number with the value of the round number configuration register, and judging whether encryption is completed or not.

And (3) the working flow of the round function arithmetic unit, namely resetting all round function arithmetic modules and round key generation modules by the round function controller when encryption starts, and initializing the value of the round number configuration register to 0. The round function controller triggers the round key generation module to generate 32 round keys according to the input initial key and distributes the round keys to the corresponding round function operation modules. The fetch controller reads the first plaintext data packet from the packet memory area and sends the first plaintext data packet to the first-stage round function operation module of the pipeline. The round function controller controls the data packet to be transmitted step by step in the pipeline, and one round of encryption operation is carried out on the data packet through one round of function operation module. The round function controller generates a selection signal of the round function selector according to the current encryption round number, and gates the output of the round function operation module corresponding to the round number to the next stage of the pipeline. And repeating until the data packet passes through all round function operation modules, and completing 32 rounds of encryption operation. The last stage of the pipeline outputs encrypted ciphertext data packets and stores the encrypted ciphertext data packets in a ciphertext buffer. Repeating until all plaintext data packets are encrypted.

The key adder comprises a key packet buffer module which is realized by using on-chip memory resources (such as Block RAM or Distributed RAM) of the FPGA. The input initial key is divided into 4 sub-keys of 32 bits according to the key length (128 bits) of the SM4 encryption algorithm. 4 sub-keys are cached using 4 32-bit registers or memory locations, respectively. The output of each register or memory location is connected to the input of the key expansion module. The write port of the key packet buffer module is connected to the key input interface for receiving the initial key data, and the read port is connected to the key expansion module for providing the sub-key data. And the key expansion module is realized by using logic resources (such as LUTs and registers) of the FPGA according to a key expansion algorithm of the SM4 encryption algorithm. The key expansion module inputs 4 sub-keys with 32 bits, and outputs 1 expanded sub-key with 32 bits. Nonlinear transformations (e.g., S-box substitutions) in the key expansion process are implemented using look-up tables (LUTs). And according to the specification of the SM4 key expansion algorithm, calculating and storing the S box replacement result in advance so as to accelerate the expansion process. Linear transformation in the key expansion process is implemented using exclusive or (XOR) operations and cyclic shift operations. And carrying out random replacement on the data bits of the extended subkey, and increasing the randomness and the security of the key. A random permutation sequence may be generated using a Linear Feedback Shift Register (LFSR) or other pseudo-random number generator and the permutation of the data bits is achieved by a Crossbar Switch or data selector (multiplexor). The output of the key expansion module is a round key used by the current encryption round and is connected to the input of the key exclusive or module.

And the key exclusive-or module is realized by using logic resources (such as LUT) of the FPGA. The inputs to the key exclusive or module include the intermediate result (128 bits) output by the round function operator and the round key (32 bits) generated by the key expansion module. The intermediate result output by the round function operator is divided into 4 sub-groups according to 32 bits, and the sub-groups are respectively subjected to bit exclusive OR operation with the round key. The result of the exclusive-or operation is used as the output result of the current encryption round and is transmitted to the round function operator of the next round through the pipeline. The output of the key exclusive OR module is connected to the input of the next stage pipeline module for data transfer. Key controller-control logic for implementing the key controller using a Finite State Machine (FSM). And the key controller generates a control signal according to round number configuration of the SM4 encryption algorithm, and triggers the operations of the key expansion module and the key exclusive OR module. At each encryption round, the key controller first triggers the key expansion module to generate round keys according to the sub-keys of the current round. Then, the key controller triggers a key exclusive-or module to exclusive-or the intermediate result output by the round function operator with the round key. The key controller records the current encryption round through a counter and compares the current encryption round with the SM4 encryption round configuration to judge whether encryption is completed or not. If the current round does not reach the configured SM4 encryption round number, the key controller enters the next round, and continues to trigger key expansion and key exclusive OR operation. If the current round reaches the configured SM4 encryption round number, the key controller outputs an encryption completion signal to indicate that the whole encryption process is finished.

The key adder workflow, when encryption begins, the key controller resets the key expansion module and the key exclusive or module and initializes the round counter to 0. The key input interface writes the initial key into the key packet buffer module, and divides the initial key into 4 sub-keys and buffers. The key controller triggers the key expansion module to generate round keys of the first round according to the first sub-key. The round function arithmetic unit carries out first round encryption operation on the plaintext data packet and outputs an intermediate result. The key controller triggers a key exclusive-or module to exclusive-or the intermediate result output by the round function arithmetic unit with the round key of the first round to obtain the output result of the first round. The key controller transmits the output result of the first round to the round function operator of the next round through the pipeline. The key controller judges whether the current round reaches the configured SM4 encryption round number, if not, the next round is entered and repeated. If the current round reaches the configured SM4 encryption round number, the key controller outputs an encryption completion signal to indicate that the whole encryption process is finished.

The SM4 encryption core operation efficiency is calculated, and resources are dynamically scheduled according to the operation efficiency, wherein the method comprises the step of realizing a pseudo-random number generator by using a Linear Feedback Shift Register (LFSR) on an FPGA chip. The LFSR consists of a plurality of cascaded D flip-flops and xor gates, and by selecting an appropriate feedback polynomial, a pseudorandom number sequence with a longer period can be generated. And selecting a proper LFSR stage number according to the required pseudo random number bit width. For example, a 32-bit pseudo random number may be generated using a 32-stage LFSR. The LFSR is instantiated in the FPGA using logic resources (e.g., LUT, FF, etc.) and its operation is clocked. The LFSR generates a new pseudo-random number according to the feedback polynomial every clock cycle. The output of the LFSR is used as a handshake password for a digital fetcher, a round function arithmetic unit and a key adder.

Fig. 3 is an exemplary flow chart of collecting handshake signals according to some embodiments of the present applications that provide for the implementation of independent handshake channels on an FPGA chip using high-speed serial transceivers (e.g., LVDS, SERDES, etc.). A pair of high-speed serial transceivers are respectively arranged among the digital acquirer, the round function arithmetic unit and the key adder and are used for transmitting and receiving handshake signals. The high-speed serial transceiver of the transmitting end converts the parallel handshake data into serial data and transmits the serial data to the receiving end through a differential signal line. The high-speed serial transceiver at the receiving end converts the received serial data into parallel data, and the original handshake signals are restored. The handshake channels and the data channels are mutually independent, so that high-speed transmission of handshake signals is ensured, and the bandwidth and time delay of the data channels are not influenced. Setting a physical non-deterministic source (PHYS), namely setting the physical non-deterministic source (PHYS) at a transmitting end and a receiving end of a handshake channel respectively, and generating random noise. The PHYS may be implemented using a ring oscillator. The ring oscillator is formed by cascading an odd number of inverters, and generates an unstable oscillation signal. By sampling the output of the ring oscillator, random noise can be obtained. The PHYS may also be implemented using a quantum random number generator. The quantum random number generator generates true random numbers using quantum noise sources (e.g., jet noise, photon noise, etc.). And combining the output of the PHYS with the handshake password generated by the pseudo-random number generator to form a final handshake password. This may enhance the randomness and unpredictability of the handshake password.

And generating and sending a first handshake signal, namely acquiring the handshake password 1 at the current moment from the pseudo-random number generator after the sampler finishes plaintext packet sampling. And packaging the password 1, the fetch completion identification code and the timestamp to generate a first handshake signal. The identification code is used for identifying the fetch completion event, and the time stamp records the time of fetch completion. The first handshake signal is sent to the round function operator through a high-speed serial transceiver on the handshake channel. And generating and sending the second handshake signal, namely verifying the password 1 after the round function arithmetic unit receives the first handshake signal. After the verification is passed, the handshake password 2 at the next time is acquired from the pseudo-random number generator. And packaging the password 2, the round function operation completion identification code, the time stamp and the first handshake signal to generate a second handshake signal. The identification code is used for identifying the round function operation completion event, and the time stamp records the operation completion time. The second handshake signal is sent to the key adder through the high-speed serial transceiver on the handshake channel. And generating and sending a third handshake signal, namely verifying the password 2 after the key adder receives the second handshake signal. After the verification is passed, the handshake password 3 at the next time is acquired from the pseudo-random number generator. And packaging the password 3 with the encryption completion identification code, the timestamp, the first handshake signal and the second handshake signal to generate a third handshake signal. The identification code is used to identify the encryption completion event and the time stamp records the time of encryption completion. The third handshake signal is returned to the fetcher through the high speed serial transceiver on the handshake path.

And (3) verifying the handshake signals, namely comparing the received handshake password with the output of the PHYS in the handshake interface to judge the validity of the handshake password. If the received handshake password is consistent with the output of the PHYS, the handshake password is valid, and the handshake verification is passed. If the received handshake password is inconsistent with the PHYS output, the handshake password is invalid, and the handshake verification fails. The handshake interface feeds back handshake failure information to the transmitting end, and requests to resend handshake signals. And after the fetcher receives the third handshake signal, verifying the password 3. After the verification is passed, a complete handshake process is indicated. The fetcher enters the next round of encryption flow and resumes the handshake flow of the new round. And the calculation module calculates the operation time, analyzes the acquired first handshake signal, second handshake signal and third handshake signal, and extracts the timestamp information therein. And respectively calculating the operation time of the extractor, the round function operator and the key adder according to the timestamp difference value of the adjacent handshake signals. The fetch controller calculates an operation time t1=t3_end-t1_start of the fetch device according to the start time stamp t1_start of the first handshake signal and the end time stamp t3_end of the third handshake signal. The round function controller calculates an operation time t2=t2_end-t1_end of the round function operator based on the end timestamp t1_end of the first handshake signal and the end timestamp t2_end of the second handshake signal. The key controller calculates an operation time t3=t3_start-t2_end of the key adder based on the end time stamp t2_end of the second handshake signal and the start time stamp t3_start of the third handshake signal.

And the SM4 encryption core operation efficiency is calculated by extracting time stamp information according to the acquired handshake signals, and calculating the operation time T1 of the extractor, the operation time T2 of the round function operator and the operation time T3 of the key adder. Corresponding weight coefficients W1, W2 and W3 are set for the extractor, round function arithmetic unit and key adder in the SM4 encryption core. The weight coefficient represents the degree of influence of each module on encryption efficiency. Through experimental testing and statistical analysis, the appropriate weight coefficient value is determined. The method comprises the steps of designing a group of test vectors, including plaintext data with different lengths and characteristics, and carrying out encryption processing on an SM4 encryption core. In the encryption process, the operation time T1, T2 and T3 of each module and the encryption time T of the whole encryption core are recorded. Through multiple tests, a certain amount of sample data is collected. And fitting out the weight coefficients W1, W2 and W3 by using a least square method and other mathematical optimization methods, so that the error between the weighted operation time and the whole encryption time is minimum. The weight coefficient should satisfy the constraint condition, such as the sum of the weight coefficients is 1, the weight coefficient is non-negative, etc. According to the calculated operation time T1, T2, T3 and the weight coefficient W1, W2, W3, the operation efficiency E of the SM4 encryption core is calculated by the formula e=1/(w1×t1+w2×t2+w3×t3). The calculation of the operation efficiency E can be realized by using a floating point operation unit in the FPGA, or the calculation is simplified by a table lookup optimization method and the like.

Dynamic resource scheduling optimization, namely dynamically adjusting FPGA resource allocation by adopting a priority scheduling algorithm according to the calculated operation efficiency E of each SM4 encryption core. And designing a resource management module which is responsible for monitoring the operation efficiency of each SM4 encryption core and distributing resources according to a priority scheduling algorithm. The resource management module maintains a priority queue, arranges the SM4 encryption core with higher operation efficiency in front of the queue, and arranges the SM4 encryption core with lower operation efficiency in back. When available logic resources, storage resources, DSP resources and the like exist in the FPGA, the resource management module distributes the resources to the SM4 encryption core according to the order of the priority queue. For the SM4 encryption core with higher operation efficiency, more resources are allocated, such as parallelism is increased, clock frequency is increased, and the like, so that the processing performance of the SM4 encryption core is improved. For SM4 encryption cores with lower operation efficiency, resource allocation is properly reduced, such as reducing parallelism, reducing clock frequency, and the like, and resources are preferentially allocated to cores with higher efficiency. The resource management module dynamically adjusts resource allocation according to actual resource use conditions and requirements of encryption tasks. When encryption tasks change or a bottleneck occurs in resource use, the priority queue and the resource allocation strategy are adjusted in time.

The selection of the priority scheduling algorithm may be dependent on the actual requirements. Common priority scheduling algorithms comprise priority scheduling based on weights, namely different weights are distributed according to the operation efficiency of an SM4 encryption core, and the higher the weight is, the higher the priority is. Priority scheduling based on deadlines, namely, sequencing the priority of the SM4 encryption cores according to the deadlines of the encryption tasks, wherein the higher the deadlines are, the higher the priority is. Priority scheduling based on fairness, namely considering the resource use condition of each SM4 encryption core, ensuring fairness of resource allocation and avoiding individual cores from occupying resources for a long time. And the processing performance of each SM4 encryption core is balanced through dynamic resource scheduling optimization, so that the overall encryption throughput and efficiency are improved.

The hardware implementation is that a resource management module is designed and implemented in the FPGA, and the resource management module comprises a priority queue, a resource allocation controller and the like. The resource management module is connected with each SM4 encryption core through a bus or a special interface, so that dynamic allocation and control of resources are realized. The design of SM4 encryption core needs to consider the scalability and reconfigurability of resources to support dynamic resource scheduling. In the SM4 encryption core, a resource control interface is designed, a control signal of a resource management module is received, and internal resource usage such as parallelism, clock frequency and the like is dynamically adjusted. And (3) realizing the design of the resource management module and the SM4 encryption core through hardware description languages (such as Verilog and VHDL), and carrying out synthesis, layout and wiring to generate an FPGA bit stream file. Software control, namely integrating a soft core processor (such as ARM, microBlaze and the like) in the FPGA and running a software program for resource scheduling and control. The software program calculates the priority by reading the operation efficiency of the SM4 encryption core, and transmits the priority information to the resource management module. The software program dynamically adjusts the priority scheduling algorithm and the resource allocation strategy according to the requirements of the encryption task and the state of the system. The software program communicates with the resource management module and the SM4 encryption core through a configuration register or a control interface to realize the control of dynamic resource scheduling.

When the SM4 encryption core executes encryption operation, plaintext data and intermediate results are mixed with random polynomial operation, the polynomial operation is expanded by adopting a Horner algorithm and combined with logic operation, so that the capability of resisting side channel attack is enhanced. The random polynomial coding of the plaintext data is that the fetch controller reads plaintext data packets from an external interface or data buffer, each packet being 128 bits in length. The read block of plaintext data is divided into a number of sub-blocks, each of which may be set in length according to the requirements of the polynomial operation, for example 32 bits or 64 bits. And taking each sub-group as an input variable of polynomial operation, and sequentially performing random polynomial coding. Random variable generation, namely generating random values of variables x, y and z in a polynomial operation expression f (x, y, z) = (ax+b) y+cz by adopting a Linear Feedback Shift Register (LFSR). According to the safety requirement of polynomial operation, selecting proper LFSR feedback polynomial and initial state. Common feedback polynomials are x16+x14+x13+x11+1 (16-bit LFSR), x32+x22+x2+x1+1 (32-bit LFSR), x64+x63+x61+x60+1 (64-bit LFSR), and setting the initial state of the LFSR as a key or random seed to ensure that the generated random sequence has sufficient randomness and unpredictability. And generating a pseudo-random sequence as the values of variables x, y and z through shift and feedback operation of the LFSR. According to the requirement of the polynomial operation expression, selecting proper digits to intercept the random sequence generated by the LFSR, and obtaining the required random variable value.

Random coefficients are generated by generating coefficients a, B, c for each of f (x, y, z) from the brainwell sequence B (n). The brainwell sequence is a pseudo-random sequence and has good randomness and uniform distribution characteristics. Its generation method is that an initial value B (0) is chosen as the initial value of the sequence. Subsequent sequence values are generated by recursion formula B (n) = (B (n-1) ×p) modq. Where p and q are two mutually prime positive integers, p=31 and q=127 are typically chosen. And according to the requirement of a polynomial operation expression, selecting proper digits to intercept the Bracewell sequence to obtain the required random coefficients a, b and c. Multiplying the generated random coefficient with a corresponding term in the polynomial operation expression to obtain a complete polynomial expression. Polynomial operation and encoding by substituting plaintext data sub-packets as the input variable x, random variables y and z and random coefficients a, b, c of the polynomial operation into the polynomial expression f (x, y, z) = (ax+b) y+cz. And performing polynomial operation through a polynomial operation unit, and calculating to obtain an encoded result. The polynomial operation unit can be realized by using hardware resources such as a multiplier, an adder, a shift register and the like so as to improve the operation efficiency. In order to reduce the consumption of hardware resources, a Horner algorithm can be used for expanding polynomial operation, and the polynomial operation is converted into a series of multiply-add operations. And for each plaintext data subgroup, performing corresponding polynomial operation to obtain an encoded result.

And outputting a coding result, namely splicing the result obtained by coding each plaintext data sub-packet by using a random polynomial to form a complete coded plaintext data packet. And transmitting the coded plaintext data packet as the output of the fetch controller to a round function arithmetic unit for subsequent encryption operation. By random polynomial coding, the statistical characteristics of plaintext data are covered, the randomness and unpredictability of the data are increased, and the capability of resisting side channel attacks is improved. The Horner algorithm in the round function operator expands the polynomial by expanding the random polynomial f (x, y, z) = (ax+b) y+cz to the Horner algorithm form f (x, y, z) = (ay+c) x+by+cz. The expansion of the Horner algorithm is obtained by arranging the polynomials from high to low in power of the variable x. The expanded polynomial form can reduce the multiplication times and improve the calculation efficiency. Calculation of polynomial in round function operator, calculating the value of polynomial by means of loop iteration. First, a value of (ay+c) is calculated, where a is a polynomial coefficient, y is a random variable, and c is a constant term. Then, the result of (ay+c) is multiplied by the variable x to obtain the value of (ay+c) x. Next, the result of (ay+c) x is added to the next term by in the polynomial to obtain the value of (ay+c) x+by. Finally, the result of (ay+c) x+by is added to the constant term cz to obtain the final result of the polynomial f (x, y, z).

Mixing of intermediate results in the course of computing the polynomial in each iteration, mixing the intermediate results with other operations of the round function, increasing the complexity and randomness of computation. The intermediate result may be mixed with the S-box contribution and non-linear transformed. The intermediate result may also be mixed with a linear transformation, which may be linearly aliased. By mixing polynomial computation with other operations of the round function, the computational complexity is increased and the capability of resisting side channel attacks is improved. And the hardware implementation is that a polynomial operation unit is realized by using DSP resources or logic resources in the FPGA, and the expansion and calculation of a Horner algorithm are supported. The flow of polynomial computation is controlled by a Finite State Machine (FSM) to coordinate the execution of the individual steps. The polynomial operation unit is integrated with other modules such as an S box, linear transformation and the like to form a complete round function operator. Through pipeline design, parallelism and throughput of polynomial computation are improved, and computation delay is reduced.

The key in the key adder is randomly encoded by grouping and encoding the keys, namely, the 128-bit key grouping is divided into a plurality of sub-keys, and the length of each sub-key can be set according to the requirement of polynomial operation, for example, 32 bits or 64 bits. For each subkey, a random variable value and a random coefficient are generated, and are substituted into a polynomial expression f (x, y, z) for calculation, so that a coded result is obtained. The random variable values and random coefficients are generated by using LFSR, bracewell sequence generator and other modules similar to the random polynomial coding of the plaintext data. Exclusive-or operation of the coding result, namely, exclusive-or operation is carried out on the result obtained by carrying out random polynomial coding on each subkey and the corresponding round key generated by the key expansion algorithm. The exclusive-or operation may mix the encoded key with the expanded key, increasing the randomness and unpredictability of the key. The result of the exclusive-or operation is used as a randomized round key for subsequent key addition operations.

And performing mixed exclusive OR operation in the key adder, namely performing logical operation on round function operation results and round keys, namely performing logical operation on the round function operation results and the round keys when the key adder performs round key addition operation. Different logical operations, such as and, or, exclusive or, etc., can be selected, increasing the diversity and randomness of the computation. By using different logical operations, the result of the key addition operation is made more unpredictable and complex. Exclusive or of the logical operation result and the random polynomial operation result, namely exclusive or operation is carried out on the result obtained by carrying out logical operation on the round function operation result and the round key and the result obtained by carrying out random polynomial operation. The result of the random polynomial operation is obtained by performing random polynomial encoding on the round function operation result. And mixing the logic operation result with the random polynomial operation result through the exclusive OR operation to obtain a final key addition result.

Random insertion wait period-design of pseudo-random number generator-a random number sequence is generated using a Linear Feedback Shift Register (LFSR) or other pseudo-random number generator. And selecting an appropriate LFSR feedback polynomial and an initial state according to the security requirement of SM4 encryption, and ensuring that the generated random number has good statistical properties. The output of the pseudorandom number generator may be used to control whether a wait period is inserted and the length of time that is waited. And (3) randomly inserting the waiting period, namely determining whether to insert the waiting period or not according to the output of the pseudo-random number generator before each round of operation of SM4 encryption. If the value of the pseudo random number meets a preset condition (e.g., is greater than a certain threshold), a wait period is inserted before the current round of operation. The length of the waiting period may also be determined based on the value of the pseudo-random number, and different waiting times graininess may be selected, such as 1 clock cycle, 2 clock cycles, etc. And (3) random polynomial coding of the final result, namely, carrying out random polynomial coding on the final encryption result, namely, carrying out random polynomial coding on the final 128-bit encryption result after all rounds of SM4 encryption are completed. Similar to the encoding process of plaintext data, the final encryption result is divided into a number of subpackets, each of which may be set in length according to the requirements of the polynomial operation. For each sub-packet, a random variable value and a random coefficient are generated, and are substituted into a polynomial expression f (x, y, z) for calculation, so that a coded result is obtained. And splicing the encoded results to obtain complete mixed ciphertext data.

Claims

1. An SM4 encryption method based on FPGA, comprising:

Obtain the plaintext data to be encrypted and input the plaintext data into the FPGA chip;

Multiple SM4 encryption cores are configured on the FPGA chip, and the SM4 encryption cores are used to perform parallel encryption operations on plaintext data to obtain ciphertext data; each SM4 encryption core adopts a pipeline architecture, which includes a counter, a round function operator and a key adder, which respectively perform the three stages of the encryption operation: counter, round function operation and key addition;

Using the physical noise-based random number extraction circuit integrated on the FPGA chip to generate an N-bit random number, the generated N-bit random number is used as the encryption key of each SM4 encryption core for SM4 encryption operations;

In each SM4 encryption core, the handshake signals between the counter, round function operator and key adder are collected. The handshake signals are used to reflect the operation status of the counter, round function operator and key adder.

According to the collected handshake signals, the operation efficiency of each SM4 encryption core is calculated;

According to the calculated operation efficiency, the priority scheduling algorithm is used to dynamically adjust the FPGA resource allocation between each SM4 encryption core;

In the process of each SM4 encryption core performing encryption operations, polynomial operations are mixed with logical operations to improve the power analysis attack capability; and wait cycles are randomly inserted in the encryption operation to improve the SM4 encryption's ability to resist side channel attacks by randomizing the operation timing;

Calculate the SM4 encryption core operation efficiency based on the collected handshake signals, including:

Parse the collected first handshake signal, second handshake signal and third handshake signal, extract timestamp information, and calculate the operation time of the counter, round function operator and key adder respectively according to the timestamp difference of adjacent handshake signals;

The data acquisition controller calculates the operation time T1=T3_end-T1_start of the data acquisition device according to the start time stamp T1_start of the first handshake signal and the end time stamp T3_end of the third handshake signal;

The round function controller calculates the operation time T2=T2_end-T1_end of the round function operator according to the end timestamp T1_end of the first handshake signal and the end timestamp T2_end of the second handshake signal;

The key controller calculates the operation time T3 of the key adder according to the end timestamp T2_end of the second handshake signal and the start timestamp T3_start of the third handshake signal, which is T3=T3_start-T2_end;

The computational efficiency E of the SM4 encryption core is calculated using the following formula:

E＝1/(W1*T1+W2*T2+W3*T3),

Among them, W1, W2, and W3 are the weight coefficients of the counter, round function operator, and key adder in the SM4 encryption core that affect the encryption efficiency.

2. The FPGA-based SM4 encryption method according to claim 1, characterized in that:

Configure multiple SM4 encryption cores on the FPGA chip, including:

Use Verilog HDL or VHDL hardware description language to write multiple SM4 encryption algorithm IP core codes and obtain the RTL level description of the SM4 encryption IP core;

Input the RTL level description of the SM4 encrypted IP core into the logic synthesis tool, perform logic synthesis on the SM4 encrypted IP core code, convert the RTL level description into a logic gate level netlist, and generate a logic gate level netlist file of the SM4 encrypted IP core;

The generated logic gate-level netlist file is used as input, and the layout and routing synthesis tool is used to perform FPGA layout and routing on the SM4 encrypted IP core, map the logic units in the SM4 encrypted IP core to the physical resources of the FPGA device, and complete the layout and routing between the physical resources; wherein the physical resources include the lookup table LUT and the trigger FF;

Before FPGA layout and routing, balance the clock tree on the FPGA chip to make the clocks of various physical resources of the SM4 encryption IP core consistent; and put the SM4 encryption IP core into a preset initial state through asynchronous reset;

After completing the FPGA layout and routing, the SM4 encryption IP core is connected to the physical resources of the FPGA chip through port mapping to form a group of SM4 encryption cores for parallel SM4 encryption operations.

3. The FPGA-based SM4 encryption method according to claim 2, characterized in that:

The logic synthesis tool includes at least one of Synopsys Design Compiler, Cadence RTL Compiler and MentorGraphics Precision;

The placement and routing synthesis tool includes at least one of Xilinx Vivado, Intel Quartus, and Lattice Diamond.

4. The FPGA-based SM4 encryption method according to claim 3, characterized in that:

In pipeline architecture:

The data acquisition device adopts an architecture that supports variable-length packet acquisition. According to the packet length of the SM4 encryption algorithm, the input plaintext data to be encrypted is divided into groups, and the plaintext data is divided into multiple data groups of equal length, and the divided data groups are output to the next level of the pipeline in sequence;

The round function operator adopts a programmable architecture that supports dynamic configuration of the number of encryption rounds. The number of SM4 encryption rounds is set through a configurable register, and the number of round function operations is controlled according to the configured number of rounds. In each round of round function operation, nonlinear transformation and linear transformation are performed on the input data group, and the round function operation result is output to the next stage of the pipeline;

The key adder adopts an architecture that supports key grouping and group key prediction. It divides the input encryption key into multiple key groups, expands the key groups by table lookup, predicts the next key group based on the current key group, and performs an XOR operation on the round function operation result and the predicted key group to generate the encryption result of the current round.

5. The FPGA-based SM4 encryption method according to claim 4, characterized in that:

Data acquisition device, including:

A packet storage area, used to cache plaintext data packets to be retrieved;

The circular buffer uses a linked list structure to manage plaintext data packets. The linked list structure includes multiple linked list nodes, a linked list head pointer and a linked list operation interface. Each linked list node is used to record the starting address and packet length of a plaintext data packet. The linked list head pointer points to the first node in the linked list. The linked list operation interface is used to manage insertion and deletion of the linked list.

The data acquisition controller traverses the linked list through the linked list operation interface according to the packet length of the SM4 encryption algorithm, and reads the plaintext data packets recorded in the linked list in sequence. Each time a data packet is read, the corresponding linked list node is deleted through the linked list operation interface to read the data of the variable-length packet; the data acquisition controller manages the caching and reading of the plaintext data packets in the circular buffer by controlling the write pointer and read pointer of the circular buffer.

6. The FPGA-based SM4 encryption method according to claim 5, characterized in that:

Round function operator, including:

The round number configuration register is used to store the round number configuration value of the SM4 encryption algorithm;

Multiple round function operation modules, each round function operation module is used to perform a round of encryption operation of the SM4 encryption algorithm, the encryption operation includes nonlinear transformation and linear transformation, the nonlinear transformation performs an S-box replacement operation on the input data group, and the linear transformation performs a linear transformation on the result after the S-box replacement;

The round function selector has an input end connected to the output of each round function operation module and an output end connected to the next stage of the pipeline. According to the round number configuration value, the output of the round function operation module corresponding to the round number is selected to the next stage of the pipeline;

The round key generation module generates the round key of each round through the key expansion algorithm according to the input initial key, and provides the generated round key to the corresponding round function operation module;

The round function controller controls the transmission of data packets between pipeline modules at various levels according to the process of the SM4 encryption algorithm, controls the round function selector to switch the round function, controls the round key generation module to provide round keys, and performs multi-round encryption operations of the SM4 encryption algorithm.

7. The FPGA-based SM4 encryption method according to claim 6, characterized in that:

Key adder, including:

The key grouping cache module caches the key used by the SM4 encryption algorithm and divides the input initial key into 4 subkeys and caches them;

The key expansion module expands the subkey output by the key grouping cache module by table lookup; and randomly permutes the data bits of the expanded subkey to generate a round key for the current encryption round;

The key XOR module performs a bitwise XOR operation on the intermediate result output by the round function operator and the round key generated by the key expansion module to obtain the output result of the current encryption round;

The key controller, according to the round number configuration of the SM4 encryption algorithm, triggers the key expansion module to generate round keys in each encryption round, controls the key XOR module to execute round key XOR, and passes the XOR result to the next round through the pipeline; the key controller uses a finite state machine to determine whether the key expansion and key XOR are completed according to the current round and the number of SM4 encryption rounds. If not, it enters the next round to continue execution until the configured number of SM4 encryption rounds is completed.

8. The FPGA-based SM4 encryption method according to any one of claims 4 to 7, characterized in that:

Collect handshake signals, including:

A pseudo-random number generator is set on the FPGA chip to generate different pseudo-random numbers as handshake passwords;

An independent channel is set between the counter, the round function operator and the key adder. The independent channel uses the high-speed serial transceiver of the FPGA and is independent of the data channel. A physical non-deterministic source PHYS is set at the transmitting and receiving ends of the independent channel. By using the PHYS output as the verification factor of the handshake password, a handshake password challenge response verification mechanism based on physical non-determinism is constructed.

After the data acquisition device completes the plaintext group data acquisition, it obtains the handshake password 1 at the current moment from the pseudo-random number generator, packages the password 1 with the data acquisition completion identification code and the timestamp, generates a first handshake signal, and sends the first handshake signal to the round function operator through the handshake interface on the independent channel;

The round function operator verifies the password 1 in the first handshake signal. After the verification is passed, the handshake password 2 at the next moment is obtained from the pseudo-random number generator, and the password 2 is packaged with the round function operation completion identification code, the timestamp and the first handshake signal to generate a second handshake signal, and the second handshake signal is sent to the key adder through the handshake interface;

The key adder verifies the password 2 in the second handshake signal. After the verification is passed, it obtains the handshake password 3 at the next moment from the pseudo-random number generator, packages the password 3 with the encryption completion identification code, the timestamp, the first handshake signal and the second handshake signal, generates a third handshake signal, and returns the third handshake signal to the data acquirer through the handshake interface;

The data reader verifies the password 3 in the third handshake signal. After the verification is passed, a handshake process is completed and the next round of encryption process begins. Among them, the handshake interface determines the validity of the handshake password by comparing the received handshake password with the PHYS output.

9. The FPGA-based SM4 encryption method according to claim 1, characterized in that:

In the process of performing encryption operations in each SM4 encryption core, polynomial operations and logical operations are mixed to perform calculations, including:

When the data acquisition controller reads the plaintext data group, the plaintext data is used as the input variable of the polynomial operation, and the linear feedback shift register LFSR is used to generate the random values of the variables x, y, and z in the polynomial operation expression f(x, y, z) = (ax+b)y+cz, and the coefficients a, b, and c of each term of f(x, y, z) are generated through the Bracewell sequence B(n); the encoded result is used as the output of the data acquisition controller and passed to the round function operator to cover up the statistical characteristics of the plaintext data;

When the round function operator performs nonlinear transformation, the operation process of the polynomial f(x, y, z) is expanded by the Horner algorithm;

When the key adder generates the round key, it first uses the polynomial f(x, y, z) to randomly encode each subkey in the key group, and then expands the key, mixes the expanded result with the random encoding result, and obtains the randomized round key;

When the key adder performs the round key addition operation, the round function operation result and the round key are first subjected to a logical operation, and then XORed with the random polynomial operation result to obtain a mixed key addition result;

After completing a round of SM4 encryption, the encryption result is mixed with the result of the random polynomial f(x, y, z) operation and XORed as the input for the next round of encryption; after the last round of encryption is completed, the final result is encoded again through the random polynomial f(x, y, z) operation to obtain the obfuscated ciphertext output.