Disclosure of Invention
The application aims to provide a data query method, a data query system, a heterogeneous computation acceleration platform and a storage medium, which can improve the data interaction rate of a CPU chip and an FPGA chip and improve the data query efficiency of the heterogeneous computation acceleration platform.
In order to solve the technical problem, the present application provides a data query method, which is applied to a heterogeneous computing acceleration platform including a CPU chip and an FPGA chip, and the data query method includes:
when the CPU chip receives a data query instruction, determining a target data table and a data calculation rule according to the data query instruction;
writing the target data table into a host memory space, and transmitting the data calculation rule to the FPGA chip;
and controlling the FPGA chip to read the target data table in the memory space of the host through a consistency cache interface, and executing data query operation on the target data table according to the data calculation rule to obtain a query result so as to return the query result to the CPU chip.
Optionally, transmitting the data calculation rule to the FPGA chip includes:
and transmitting the data calculation rule to the FPGA chip in a register writing mode.
Optionally, determining the target data table according to the data query instruction includes:
and analyzing the data query instruction to obtain a target data table type, and taking a data table corresponding to the target data table type in a database as the target data table.
Optionally, the CPU chip is connected to the second coherent cache interface of the FPGA chip through the first coherent cache interface.
Optionally, after writing the target data table into the host memory space, the method further includes:
and the CPU chip sends notification information to the FPGA chip so that the FPGA chip executes the operation of reading the target data table in the memory space of the host through the consistency cache interface.
Optionally, after performing a data query operation on the target data table according to the data calculation rule to obtain a query result, the method further includes:
the FPGA chip stores the query result to the host memory space through the consistency cache interface and records the memory address of the query result in the memory space;
and sending the memory address to the CPU chip so that the CPU chip can read the query result according to the memory address.
Optionally, after the FPGA chip stores the query result to the host memory space through the coherent cache interface, the method further includes:
and the FPGA chip sends an interrupt signal to the CPU chip so that the CPU chip can read the query result according to the received memory address.
The application also provides a data query system, which is applied to a heterogeneous computing acceleration platform comprising a CPU chip and an FPGA chip, and the system comprises:
the instruction receiving module is used for determining a target data table and a data calculation rule according to a data query instruction when the CPU chip receives the data query instruction;
the data transmission module is used for writing the target data table into a host memory space and transmitting the data calculation rule to the FPGA chip;
and the data query module is used for controlling the FPGA chip to read the target data table in the memory space of the host through a consistency cache interface and executing data query operation on the target data table according to the data calculation rule to obtain a query result so as to return the query result to the CPU chip.
The application also provides a storage medium, on which a computer program is stored, which when executed implements the steps executed by the above data query method.
The application also provides a heterogeneous computing acceleration platform, which comprises a CPU chip and an FPGA chip; the CPU chip is used for determining a target data table and a data calculation rule according to a data query instruction when the data query instruction is received; writing the target data table into a host memory space, and transmitting the data calculation rule to the FPGA chip; the FPGA chip is used for reading the target data table in the memory space of the host through a consistency cache interface, executing data query operation on the target data table according to the data calculation rule to obtain a query result, and returning the query result to the CPU chip.
The application provides a data query method applied to a heterogeneous computing acceleration platform comprising a CPU chip and an FPGA chip, and the data query method comprises the following steps: when the CPU chip receives a data query instruction, determining a target data table and a data calculation rule according to the data query instruction; writing the target data table into a host memory space, and transmitting the data calculation rule to the FPGA chip; and controlling the FPGA chip to read the target data table in the memory space of the host through a consistency cache interface, and executing data query operation on the target data table according to the data calculation rule to obtain a query result so as to return the query result to the CPU chip.
The CPU chip of the application determines a target data table to be inquired and a data calculation rule utilized by the inquiry data after receiving a data inquiry instruction, the CPU chip writes the target data table into a host memory space and transmits the data calculation rule to the FPGA chip, and the FPGA chip can directly read the target data table in the host memory space through a consistency cache interface and perform data inquiry operation corresponding to the data calculation rule to obtain an inquiry result. In the process, the FPGA chip directly acquires the memory data based on the consistency cache interface, the address space of the FPGA end is not required to be mapped into the memory address space of the CPU end, and the corresponding memory space of the host computer is not required to be accessed through the PCIe controller. The application also provides a data query system, a heterogeneous computing acceleration platform and a storage medium, which have the beneficial effects and are not repeated herein.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a flowchart of a data query method according to an embodiment of the present application.
The specific steps may include:
s101: when the CPU chip receives a data query instruction, determining a target data table and a data calculation rule according to the data query instruction;
the present embodiment may be applied to a heterogeneous computing acceleration platform, where the heterogeneous computing acceleration platform may include a host terminal provided with a CPU chip and an acceleration terminal provided with an FPGA, and when a CPU (Central Processing Unit) chip receives a data query instruction, the data query instruction may be analyzed, and a target data table and a data computation rule may be determined according to an analysis result. Specifically, the target data table is an object of data query operation, the number of the target data tables may be any number, and the data calculation rule is a specific query mode for the target data table.
As a possible implementation manner, the present embodiment may analyze the data query instruction to obtain a target data table type, and use a data table in the database corresponding to the target data table type as the target data table.
S102: writing the target data table into a host memory space, and transmitting the data calculation rule to the FPGA chip;
in this embodiment, the CPU chip is used to complete the receiving and distributing of the data query task, and after the target data table is determined, the target data table may be written into a host memory space at the host end, and the data calculation rule is transmitted to an FPGA (field programmable Gate Array) chip, so that the FPGA chip executes a specific data query operation.
As a possible implementation manner, the present embodiment may transmit the data calculation rule to the FPGA chip by writing a register.
S103: and controlling the FPGA chip to read the target data table in the memory space of the host through a consistency cache interface, and executing data query operation on the target data table according to the data calculation rule to obtain a query result so as to return the query result to the CPU chip.
After receiving the data calculation rule, the FPGA chip can read a target data table in the memory space of the host through the consistency cache interface, after caching the target data into the internal FIFO, the FPGA chip can inform an accelerated calculation function unit in the FPGA chip so as to carry out operation operations such as data comparison, addition, subtraction, multiplication, division and the like according to the operation type in the register, and after the operation is finished, the FPGA chip can write back the line data information meeting the conditions into the memory space of the host and inform the CPU chip of finishing the processing.
In the embodiment, the CPU chip and the FPGA chip can be connected through a PCIe physical medium, but a cache consistency protocol is used for both the CPU chip and the FPGA chip, so that the FPGA chip can be allowed to directly access the host memory space at the host end, the problem of large data transmission delay in the prior art can be reduced, and the acceleration effect can be further optimized. The CPU chip of this embodiment determines the target data table to be queried and the data calculation rule used by the query data after receiving the data query instruction, writes the target data table into the host memory space and transmits the data calculation rule to the FPGA chip, and the FPGA chip can directly read the target data table in the host memory space through the coherent cache interface and perform the data query operation corresponding to the data calculation rule to obtain the query result. In the process, the FPGA chip directly acquires the memory data based on the consistency cache interface, the address space of the FPGA end is not required to be mapped into the memory address space of the CPU end, and the corresponding memory space of the host computer is not required to be accessed through the PCIe controller.
As a further introduction to the embodiment corresponding to fig. 1, a CPU chip in the heterogeneous computing acceleration platform is connected to a second coherent cache interface of the FPGA chip through a first coherent cache interface.
As a further introduction to the corresponding embodiment of fig. 1, after writing the target data table into the host memory space in S102, the CPU chip may further send notification information to the FPGA chip, so that the FPGA chip performs an operation of reading the target data table in the host memory space through the coherency cache interface.
As a further introduction to the embodiment corresponding to fig. 1, after performing a data query operation on the target data table according to the data calculation rule to obtain a query result, the FPGA chip may further store the query result in the host memory space through the consistency cache interface, and record a memory address of the query result in the memory space; the FPGA chip can also send the memory address to the CPU chip, so that the CPU chip can read the query result according to the memory address. Specifically, after the FPGA chip stores the query result in the memory space of the host through the coherent cache interface, the FPGA chip may further send an interrupt signal to the CPU chip, so that the CPU chip reads the query result according to the received memory address.
The flow described in the above embodiment is explained below by an embodiment in practical use. Referring to fig. 2, fig. 2 is a receiving schematic diagram of a processing system for implementing database acceleration based on cache coherence according to an embodiment of the present disclosure.
In the scheme provided by this embodiment, the physical link layer on the hardware still uses the PCIe interface, so that the interface of the various accelerator cards in the past can be compatible without modifying the hardware board, and the interface can be implemented by only modifying the configuration logic in the FPGA by using the programmability of the FPGA.
In the working link of the whole system, the embodiment has two more modules compared with the common architecture based on the PCIe protocol: the processor side accelerator consistency processing interface and the FPGA side interface IP respectively introduce the functions of the two modules. The processor-side accelerator consistency processing interface can be implemented based on a cache consistency protocol, and can enable the FPGA device to access an mc (memory controller) through an rc (PCIE root complex) on the CPU based on a PCIE physical medium, thereby operating memory data. The FPGA side interface IP is used as a corresponding interface of the FPGA side and the processor side accelerator consistency processing interface and is used as a bridge between the FPGA and the CPU, and the FPGA side interface IP provides the acceleration function unit with the address translation compatible with the CPU architecture and the system Memory Cache.
With the two modules, a data path of the system is built, and tasks suitable for the FPGA to accelerate can be unloaded to the FPGA for execution, and the specific operation process is as follows:
firstly, an open-source database is selected, which can be relational or non-relational, because when the task is unloaded, additional software needs to be developed to carry out data packaging processing; in addition, a data table and a relational expression to be operated are determined, and the FPGA designs the logic of the acceleration function unit according to the relational expression.
When a user operates a certain query operation in a corresponding database, software informs the FPGA of information such as an operation rule and the like according to the content (data table type and calculation rule) of a query statement and a self-defined data protocol defined between the software and the FPGA in a register writing mode, then packs the data table to be queried and writes the data table into a memory, and can determine the size of a cache opened each time according to the size of data volume of each operation of the database, wherein the cache is specifically related to user application; after the software writes the data into the memory, the FPGA is informed that the data can be fetched;
after receiving the data fetching notification, the FPGA reads the memory data through a consistency cache interface, caches the calculation data in an internal FIFO, notifies an accelerated calculation function unit, performs operation operations such as data comparison, addition, subtraction, multiplication, division and the like according to the operation type in the register, writes the data information of the row meeting the conditions back into the memory after the operation is completed, and notifies a CPU of the completion of the processing;
and after receiving the operation completion notification information, the CPU reads useful information from the previously appointed memory address and feeds back the result to the user.
When the FPGA carries out operation, the CPU can process other affairs, the acceleration task is directly unloaded to the FPGA for execution, the FPGA can inform the CPU in an interruption mode after the operation is finished, a finished mark can also be written into a readable register, and the CPU acquires the finished mark in an inquiry mode. The specific mode is determined by specific application and clients, and the FPGA is informed through the register, so that the FPGA can be flexibly configured.
Through the scheme provided by the embodiment, the defects of the original PCIe communication protocol are overcome: the problem that data communication delay is large is solved, communication delay is further reduced on the basis of the excellent acceleration effect of the FPGA, the optimal acceleration effect can be achieved, and user experience is improved. In the embodiment, the PCIe physical medium is used between the CPU and the FPGA, but a cache consistency protocol is used, so that the FPGA is allowed to directly access the memory of the server, the problem of large data transmission delay in the prior art is solved, the acceleration effect can be further optimized, and the technology is applied to database acceleration; by adopting the embodiment to accelerate the database, the FPGA can be flexibly configured, the flexible and variable acceleration function unit is integrated in the FPGA, the FPGA can be flexibly configured according to the needs of users, various types of databases and query statements can be supported, the hardware design does not need to be changed, and the labor and time cost is saved.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a data query system according to an embodiment of the present application;
the system may include:
an instruction receiving module 100, configured to determine a target data table and a data calculation rule according to a data query instruction when the CPU chip receives the data query instruction;
the data transmission module 200 is used for writing the target data table into a host memory space and transmitting the data calculation rule to the FPGA chip;
and the data query module 300 is configured to control the FPGA chip to read the target data table in the memory space of the host through a coherent cache interface, and perform a data query operation on the target data table according to the data calculation rule to obtain a query result, so as to return the query result to the CPU chip.
The CPU chip of this embodiment determines the target data table to be queried and the data calculation rule used by the query data after receiving the data query instruction, writes the target data table into the host memory space and transmits the data calculation rule to the FPGA chip, and the FPGA chip can directly read the target data table in the host memory space through the coherent cache interface and perform the data query operation corresponding to the data calculation rule to obtain the query result. In the process, the FPGA chip directly acquires the memory data based on the consistency cache interface, the address space of the FPGA end is not required to be mapped into the memory address space of the CPU end, and the corresponding memory space of the host computer is not required to be accessed through the PCIe controller.
Further, the data transmission module 200 includes:
and the rule transmission module is used for transmitting the data calculation rule to the FPGA chip in a register writing mode.
Further, the instruction receiving module 100 includes:
and the target data table determining module is used for analyzing the data query instruction to obtain a target data table type and taking a data table corresponding to the target data table type in a database as the target data table.
Furthermore, the CPU chip is connected to a second coherent cache interface of the FPGA chip through a first coherent cache interface.
Further, the method also comprises the following steps:
and the notification module is used for sending notification information to the FPGA chip by the CPU chip after the target data table is written into the host memory space, so that the FPGA chip executes the operation of reading the target data table in the host memory space through the consistency cache interface.
Further, the method also comprises the following steps:
the result storage module is used for storing the query result to the host memory space through the consistency cache interface and recording the memory address of the query result in the memory space after the FPGA chip executes data query operation on the target data table according to the data calculation rule to obtain the query result; and the CPU chip is further used for sending the memory address to the CPU chip so that the CPU chip can read the query result according to the memory address.
Further, the method also comprises the following steps:
and the terminal prompting module is used for sending an interrupt signal to the CPU chip by the FPGA chip after the FPGA chip stores the query result into the host memory space through the consistency cache interface so that the CPU chip can read the query result according to the received memory address.
The embodiment provides a method for a communication interface based on cache consistency, namely, a direct conversation can be performed between an FPGA and a CPU through a cache consistency protocol, the CPU can share a memory of a system and directly use and access the memory for the FPGA, and equivalently, a fast channel is established between the CPU and an FPGA accelerator card, so that data communication becomes simpler and more efficient. By utilizing the communication mode, the database retrieval task is unloaded to the FPGA, the FPGA directly accesses data in the database, operations such as retrieval, operation and the like are started in parallel, the result of the calculation is directly written into the memory and the CPU is informed of the completion of the calculation, the CPU can continuously execute other tasks during the FPGA retrieval period, and the result is displayed to a user after the calculation is completed, so that the performance of the database can be greatly improved, the user experience is improved, and the problem of long delay of the conventional PCIe-based communication at present can be solved.
Since the embodiment of the system part corresponds to the embodiment of the method part, the embodiment of the system part is described with reference to the embodiment of the method part, and is not repeated here.
The present application also provides a storage medium having a computer program stored thereon, which when executed, may implement the steps provided by the above-described embodiments. The storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The application also provides a heterogeneous computing acceleration platform, which may include a memory and a processor, where the memory stores a computer program, and the processor may implement the steps provided by the foregoing embodiments when calling the computer program in the memory. Of course, the heterogeneous computing acceleration platform may further include various network interfaces, power supplies, and other components.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.