Disclosure of Invention
The invention aims to solve the technical problem of providing a programmable AES encryption module hardware accelerator, an instruction set and an operation method aiming at the defects of the prior art.
In order to solve the technical problems, the invention discloses a programmable AES encryption module hardware accelerator, an instruction set and an operation method;
The accelerator is connected with a bus, performs data interaction with the bus, and performs AES encryption and decryption on the data, and comprises:
the master device is used for receiving and transmitting data carrying instructions; the slave device communicates with other modules on the bus and receives data handling instructions and data of the master device.
Further, the accelerator also comprises an instruction stack and a decoder,
The slave device is connected with the decoder through an instruction stack and transmits instructions to the decoder.
Further, the accelerator also comprises a calculating module and a carrying module,
The decoder decodes the instruction transmitted from the slave device and sends the decoded instruction to a computing module or a carrying module connected with the decoder;
the computing module is used for selecting different computing submodules to carry out corresponding operation according to the instruction;
And the carrying module is used for transmitting the data in the calculation module and the register according to the data carrying instruction sent by the main equipment.
Further, the accelerator also comprises a register,
The register is used for storing data in the accelerator.
Further, the register includes:
Special purpose registers, and general purpose registers, wherein,
The special register is used for storing specific encrypted and decrypted data;
the general register is used for storing conventional data, including intermediate calculation results and auxiliary data;
The special registers include a field operation result register GF, a verification information result register TAG, an initialization vector register IV, a KEY register H, an encryption KEY register KEY, a plaintext data register PTEXT, and a ciphertext data register CTEXT.
Further, the register further includes:
And the state register is connected with the slave equipment and is used for storing the current state of the accelerator.
Further, the computing submodule includes:
the inversion operation submodule INV is used for executing the inversion steps in the encryption and decryption processes of the AES, and has four modes of byte level inversion, word level inversion, double word level inversion and full inversion respectively;
The AES CORE submodule AES_CORE is used for executing CORE operation steps of AES encryption and decryption, and comprises byte substitution, row shift, column confusion and round key addition;
Galois field operation sub-module G2F is used for multiplying operation on the finite field, and the calculation result is put into field operation result register GF;
An exclusive-or operator module XOR for performing exclusive-or operations;
An addition operation sub-module ADD for performing an addition operation;
The special DMA submodule DMA is only used for carrying data in the SRAM into a special register and a general register;
The handling submodule MOV is used for mutual handling of data in the special register and the general register.
Further, the bus, the master device and the slave device are an AHB bus, an AHB master device and an AHB slave device.
The invention also provides a programmable AES hardware accelerator instruction set for controlling a calculation module in the accelerator to perform related calculation, wherein the instruction set comprises:
The XOR instruction performs an exclusive OR operation, wherein the input and the output are special registers or general registers, the input is 2 registers, and the output is 1 register;
an AES_CORE instruction, which performs AES encryption or decryption operation, and confirms that the initialization vector register IV, the encryption KEY register KEY and the plaintext data register PTEXT are configured before executing the instruction, and the output value is stored in the ciphertext data register CTEXT;
G2F instruction, carrying out Galois encryption operation, confirming that the domain operation result register GF and the encryption KEY register KEY are configured before executing the instruction, and storing the output value in the domain operation result register GF;
an INV instruction which performs reverse operation and is provided with four modes, namely byte level inversion, word level inversion, double word level inversion and full inversion;
An ADD instruction performs addition operation, uses a 32-bit adder to complete 128bits addition operation, inputs and outputs the ADD instruction into a special register or a general register, and inputs the ADD instruction further comprise a selected bit constant value, the ADD instruction is input into 2 registers, and the ADD instruction is output into 1 register;
a DMA instruction for carrying external data and completing bidirectional data carrying operation from the SRAM to the general register;
MOV instructions, carrying out data transfer between registers, transferring data into registers, inputting the data into a constant, a general purpose register or a special purpose register, and outputting the data into the general purpose register or the special purpose register.
The invention also provides a programmable AES hardware accelerator operation method, which is realized by adopting the accelerator, and comprises the following steps:
Step 1, judging whether the AES encryption or decryption operation of the current round is finished, if not, executing step 2, otherwise, ending the running of the current round;
step 2, writing the received new instruction into an instruction stack;
Step 3, the lowest instruction in the instruction stack is popped off;
step 4, decoding the popped instruction, and distributing the decoded instruction to a computing module or a carrying module for processing;
And 5, waiting for the processing in the step 4 to be completed, checking whether the instruction stack is empty, ending the operation if the instruction stack is empty, and otherwise, returning to the step 1.
The beneficial effects are that:
1. the invention improves the AES encryption speed and obviously accelerates the data processing compared with a pure software scheme.
2. The invention reduces the hardware cost and realizes the balance of speed and cost by combining hardware acceleration and software control.
3. The invention reduces the load of CPU and uses DMA to carry high-efficiency instruction.
4. The invention improves the flexibility of the system and supports two instruction input modes of CPU and DMA.
5. The invention optimizes the energy utilization rate and reduces the system power consumption through accurate configuration and high-efficiency data interaction.
Detailed Description
The invention has the design concept that an AES hardware accelerator performs instruction pre-filling on AES through a CPU, and when the instruction is about to run to be completed, the CPU is triggered to continue filling the instruction, so that the whole AES operation is not interrupted to the CPU so as to ensure the working efficiency of the CPU greatly as long as the instruction length is reasonable.
The invention realizes AES encryption of large-scale data by cooperating with the CPU. The accelerator combines the high speed of hardware acceleration and the low cost of software encryption, supports two instruction input modes of CPU and DMA, and improves the system efficiency and flexibility. The AES module is in data interaction with other system components through the AHB bus, reduces the CPU burden through automatic interrupt and DMA, optimizes the encryption processing process, and is suitable for high-efficiency and safe data processing scenes.
And DMA is adopted to carry instructions, so that the burden of a CPU is reduced, and the CPU can be focused on other control tasks, thereby optimizing the system performance and energy efficiency. The method supports two instruction input modes of CPU and DMA, provides flexibility and adaptability, and meets the requirements of different application scenes.
The invention completes the AES encryption process of large-scale data through linkage with the CPU. The hardware encryption method has the advantages of high encryption speed of the pure hardware AES hardware accelerator and low hardware cost of software encryption, and simultaneously has the advantages of taking the speed and the area into consideration, and liberating the CPU to complete some other control work, so that the whole system can be completed more efficiently.
The specific technical scheme of the invention is as follows:
The location and operation principle of the programmable AES module in the overall system is shown in fig. 1. In the whole system, an AES module, DMA, CPU, FLASH modules, SRAM modules and the like interact data through an AHB (ADVANCED HIGH-performance Bus) Bus to form a complete communication framework.
When the AES module starts to work, the CPU firstly configures registers of the AES, and the working flow of the AES is started.
After configuration is completed, the system writes the processing instructions to the AES instruction stack. This operation may be done by a CPU or DMA, while instructions are typically stored in SRAM or FLASH of the system.
If a CPU is used to write instructions, the AES module issues an interrupt signal to the CPU when only the last instruction remains in the instruction stack. After receiving the interrupt signal, the CPU quickly writes a new instruction into the instruction stack to ensure that the stack is kept in a full-load state. The interrupt mechanism ensures the timely transmission and processing of the instruction and avoids the influence of the instruction stack on the operation efficiency of the AES module due to the empty state.
Another way is to use general-purpose DMA as the medium for instruction transmission. The general DMA is first preconfigured and then the general DMA is started. When the instruction stack of AES is not full, AES issues a request signal (req request) to the general DMA. After receiving the request signal, the general DMA carries instructions from FLASH or SRAM and writes the instructions into the instruction stack of AES. While the AES is processing an existing instruction, the generic DMA continues to load a new instruction into the AES' instruction stack. In this process, the AES module processes the loaded instruction and the received new instruction in parallel until the AES has completed all the work in the stack and the DMA has also completed all the instruction handling work, all the operations at this stage not counting really finished.
Such a programmable AES encryption module has a number of significant advantages. Firstly, the DMA is used for transmitting the instruction, so that the work load of a CPU is greatly reduced, and the overall operation efficiency of the system is improved. DMA can carry instructions fast and efficiently, and avoid instruction loading becoming a system bottleneck. And secondly, an automatic interrupt mechanism ensures that the instruction can be updated in time when the CPU is used for writing the instruction, and ensures that an AES instruction stack is always in a full-load state, thereby improving the response speed of the system. In addition, the design supports two instruction input modes of a CPU and a DMA, provides great flexibility, can be flexibly switched under different application scenes, and has stronger adaptability. By configuring the register, the working state of the AES module can be effectively managed, resources can be used as required, the energy utilization rate is improved, and the system power consumption is reduced. Finally, the AES module performs data interaction with other system components (such as CPU, DMA, FLASH, SRAM) through the AHB bus, so as to form an efficient cooperative mechanism, and ensure high performance of the system in encryption or decryption tasks. In general, the design not only improves the working efficiency of the AES encryption module, but also enhances the flexibility and adaptability of the system, and is suitable for various scenes needing efficient and safe data processing.
Fig. 2 depicts the internal architecture of an Advanced Encryption Standard (AES) encryption and decryption hardware design proposed by the present invention. The architecture in the illustration includes several main parts, each with its own unique functions and roles to ensure efficient operation of the entire AES system.
First is the AHB SLV (AHB Slave), which is a Slave (Slave) that communicates with other modules on the advanced high-performance bus (AHB). The AHB SLV receives instructions and data from an AHB master (AHB MST) and transmits the instructions to a decoder through an instruction stack. In addition, the AHB SLV may also directly transfer data to a computing module or other data path that needs to be processed.
The status register is another key component for storing the current status of the AES encryptor. The status register can record the progress of the current operation and whether the system is operating normally. This is critical to fault recovery and system stability.
The decoder is the next most important component responsible for decoding the instructions received from the AHB SLV and distributing them to the corresponding modules. The presence of the decoder allows the AES system to flexibly handle various operation requests from the master device, such as encryption, decryption, data handling, etc.
The computing module comprises a plurality of sub-modules, each sub-module realizing a specific computing function:
An inverse operation module for performing an inverse transformation step in the AES algorithm. The inversion of data input and output is realized by using an inverse operation module INV, and the data inversion modes are four modes, namely bit byte level inversion, word level inversion, double word level inversion and full inversion.
AES CORE block, performing the CORE operation steps of AES encryption and decryption, including byte substitution, row shifting, column confusion, and round key addition. When encrypting and decrypting, the data of IV, KEY, PTEXT and the like need to be configured in advance.
G2f—galois field operation module for multiplication over a finite field, which is part of the AES algorithm. The calculation result is put in the dedicated register GF.
XOR-XOR operation module for performing an exclusive-or operation, which is an indispensable step in the AES algorithm.
The ADD-ADD module, although primarily arithmetic in the AES algorithm, may be extended for other purposes as well.
DMA-the dedicated DMA module is only used to handle data in SRAM, mainly to handle SRAM data into dedicated and general purpose registers.
MOV: private and general register data are handled with respect to each other.
The handling module is responsible for data transfer between the different computing modules and registers. It ensures that data can flow smoothly and efficiently between different calculation steps. This is a critical part of ensuring efficient operation of the system.
The dedicated registers and the general purpose registers together constitute the memory architecture of the AES system. The special purpose registers store specific encrypted and decrypted data, including Initialization Vector (IV), encryption KEY (KEY), plaintext data (PTEXT), ciphertext data (CTEXT), and the like. General purpose registers are used to store conventional data, such as reg_a (128 bits), reg_b (128 bits), and reg_c (128 bits), which are used to store temporary calculation results or other auxiliary data.
The main equipment is used for receiving and transmitting data of the carrying instruction. The transmission and reception of data can be actively controlled through the built-in main equipment in the AES system, so that the AES system has higher operation flexibility. In addition, the master device is focused on data transmission, the slave device is focused on instruction transceiving of AES, and the transmission is simpler and more efficient through a mode of separating the instruction and the data. The benefits of this design are mainly manifested in the following aspects:
1. And in the modular design, each computing module and each special module are mutually independent and are respectively responsible for different tasks, so that the design is easy to test and maintain. If a certain module fails, only the module needs to be debugged and repaired without affecting the whole system.
2. And the use of the handling module and the DMA module greatly improves the data transmission speed and reduces the processing time. This is particularly important for application scenarios where large amounts of data encryption and decryption are required.
3. And flexible instruction processing, namely, tasks can be flexibly distributed according to different instructions through the decoder, so that the system can efficiently process different types of operation requests. This flexibility enables the system to accommodate varying application requirements.
4. Parallel processing capability, that is, a plurality of operation modules can simultaneously perform operations of different steps, thereby improving the overall processing speed. Parallel processing capability is particularly important, especially in situations where large amounts of data need to be processed quickly.
In addition, although the aes_core and G2F operations can be performed using the basic operations such as exclusive or and addition, if the method is used, the number of instructions increases dramatically and the encryption time increases significantly, so that the aes_core and G2F operations are fully hardware-implemented and are a compromise between area and speed.
As shown in fig. 3, an operation method is designed for the programmable AES hardware accelerator of the foregoing design. The method starts with a flow that first checks if the current AES operation has been completed. If not, the system writes a new instruction into the instruction stack and pops the next instruction in the instruction stack. And then, decoding the popped instruction and distributing the popped instruction to corresponding hardware resources for processing. After waiting for the current operation to complete, it is checked again whether the instruction stack is empty. If the instruction stack is not empty, the steps are continuously and circularly executed until all the instructions are executed. The process ensures that the AES encryption module can continuously and efficiently load, decode and execute the instructions, and simultaneously ensures the order and the integrity of the instruction execution.
The invention also provides a set of instruction sets, which are applied to the programmable AES hardware accelerator designed as described above, as shown in Table 1:
table 1 programmable AES hardware accelerator instruction set table
Examples:
Since AES today has multiple derived forms, such as ECB, CBC, CTR, CCM, ECM and GMAC modes, each with its own fixed operation, it is necessary to write instructions separately for one particular mode.
As shown in fig. 4, CBC encryption using the AES hardware accelerator described above is exemplified as follows:
1. configuring external general DMA handling data into registers such as PTET, IV and KEY;
2. XOR PTEXT and IV using an XOR instruction and store the value in PTEXT. 3. Get final data using the AES_CORE instruction put in CTEXT;
4. data is carried into the SRAM for storage by using a DMA command CTEXT;
5. carrying CTEXT data into the IV using the MOV instruction;
6. repeating the steps of 2-5 until all the steps are finished;
As shown in fig. 5, GCM encryption using the AES hardware accelerator described above is exemplified as follows:
1. configuring external general DMA handling data into registers such as PTET, IV and KEY;
2. obtaining final data by using AES_CORE operation, and putting the final data into CTEXT;
3. using the MOV instruction to carry CTEXT data into the H special register;
4. the MOV instruction is used to move the value of the data register reg_a into the GF special register; 5, using G2F instruction to make Galois operation, storing the result into GF special register;
6. The data REG_A and the data 1 are added by using an ADD instruction to finally put the result in REG_A, 7. Repeating the steps 4-6 to finish HEADER PHASE;
8. move 2 into reg_a using MOV instruction;
9. Placing the IV and reg_a addition into the IV using the ADD instruction;
10. performing AES operation by using an AES_CORE instruction;
11. Using a DMA instruction to carry plaintext into PTEXT;
12. data CTEXT and PTEXT are exclusive-ored using an XOR command with CTEXT;13. Use DMA instructions to transfer plaintext CTEXT into SRAM;
14. xoring GF and CTEXT with an XOR instruction into a GF register;
15. Starting an operation using a G2F instruction;
16. placing the IV and 1 addition result into the IV using the ADD instruction;
17. Repeating the 10-16 instruction until the Payload Phase-related operation is completed;
18. Xoring GF and CTEXT with an XOR instruction into a GF register;
19. Starting an operation using a G2F instruction;
20. using a DMA instruction to transfer the original IV into an IV special register;
21. Performing AES operation by using an AES_CORE instruction;
22. The XOR result of GF and CTEXT is put into a TAG register by using an XOR instruction;
23. Directly carrying out TAG to the SRAM by using a DMA instruction to complete FINAL PHASE operations;
In a specific implementation, the present application provides a computer storage medium and a corresponding data processing unit, where the computer storage medium is capable of storing a computer program, where the computer program when executed by the data processing unit may perform part or all of the steps of the application, and the content of a programmable AES encryption module hardware accelerator, an instruction set, and an operation method provided by the present application. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a random-access memory (random access memory, RAM), or the like.
It will be apparent to those skilled in the art that the technical solutions in the embodiments of the present invention may be implemented by means of a computer program and its corresponding general hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be embodied essentially or in the form of a computer program, i.e. a software product, which may be stored in a storage medium, and include several instructions to cause a device (which may be a personal computer, a server, a single-chip microcomputer, an MCU or a network device, etc.) including a data processing unit to perform the methods described in the embodiments or some parts of the embodiments of the present invention.
The invention provides a hardware accelerator, an instruction set and an operating method of a programmable AES encryption module, and a method for realizing the technical scheme, wherein the method and the way are a plurality of preferred embodiments of the invention, and it should be pointed out that a plurality of improvements and modifications can be made by those skilled in the art without departing from the principle of the invention, and the improvements and the modifications are also considered as the protection scope of the invention. The components not explicitly described in this embodiment can be implemented by using the prior art.