CN115878550A - A data processing method, chip, device and system - Google Patents
A data processing method, chip, device and system Download PDFInfo
- Publication number
- CN115878550A CN115878550A CN202111152703.2A CN202111152703A CN115878550A CN 115878550 A CN115878550 A CN 115878550A CN 202111152703 A CN202111152703 A CN 202111152703A CN 115878550 A CN115878550 A CN 115878550A
- Authority
- CN
- China
- Prior art keywords
- processor core
- data processing
- processing request
- thread
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本申请实施例公开了一种数据处理的方法,用于降低数据处理时延。本申请实施例方法包括:本申请实施例应用于数据处理系统,该数据处理系统包括第一设备和第二设备,第一设备向第二设备发送数据处理请求,数据处理请求携带的信息可以指示第二设备确定是在第一处理器核还是在第二处理器核执行该数据处理请求,当该信息指示在第一处理器核执行时,第二设备可以调度第一处理器核执行该数据处理请求,当该信息指示在第二处理器核执行时,第二设备可以调度第二处理器核执行该数据处理请求。
The embodiment of the present application discloses a data processing method, which is used to reduce data processing delay. The method in the embodiment of the present application includes: the embodiment of the present application is applied to a data processing system, the data processing system includes a first device and a second device, the first device sends a data processing request to the second device, and the information carried in the data processing request can indicate The second device determines whether to execute the data processing request on the first processor core or the second processor core, and when the information indicates that the data processing request is executed on the first processor core, the second device may schedule the first processor core to execute the data processing request Processing the request, when the information indicates that the data processing request is executed by the second processor core, the second device may schedule the second processor core to execute the data processing request.
Description
技术领域technical field
本申请实施例涉及通信领域,尤其涉及一种数据处理的方法、芯片、设备以及系统。The embodiments of the present application relate to the communication field, and in particular, to a data processing method, chip, device, and system.
背景技术Background technique
远程直接内存访问(remote direct memory access,RDMA)是一种为了解决网络传输中设备内部数据处理延迟而产生的技术。RDMA将用户应用中的数据直接传入设备的存储区,通过网络将数据从一个设备快速传输到另一个设备的存储器中,消除了传输过程中多次数据复制操作,无需双方操作系统的介入,降低了设备内中央处理器(centralprocessing unit,CPU)的负载。Remote direct memory access (RDMA) is a technology created to solve the internal data processing delay of the device during network transmission. RDMA transfers the data in the user application directly to the storage area of the device, and quickly transfers the data from one device to the storage of another device through the network, eliminating multiple data copy operations during the transmission process, without the intervention of the operating systems of both parties. The load of the central processing unit (central processing unit, CPU) in the device is reduced.
然而,对于一些数据的处理,例如:对关系型数据库中的键值对(Key-Value)数据的访问,采用RDMA技术可能会导致多次RDMA访问,数据访问时延较高。另一种技术中,一个设备可以通过双边远端程序调用(remote procedure call,RPC)技术调用server的CPU实现数据的处理(例如:Key-Value数据的访问)。在这种技术中,一个设备将请求发送到另一个设备,该另一个设备一般通过设置CPU的多个轮询(poll)线程来获取请求,然后调起CPU中的执行线程来执行请求。上述方案虽然避免了直接调用RDMA的多次网络访问,但是CPU进行的poll线程开销大,影响了CPU中执行线程的使用,即也影响了数据的处理速度。However, for some data processing, for example: access to key-value pair (Key-Value) data in a relational database, the use of RDMA technology may result in multiple RDMA accesses, and the data access delay is relatively high. In another technique, a device may call the CPU of the server to process data (for example, access to Key-Value data) through a bilateral remote procedure call (RPC) technique. In this technique, one device sends a request to another device, and the other device generally obtains the request by setting up multiple polling (poll) threads of the CPU, and then invokes an execution thread in the CPU to execute the request. Although the above solution avoids multiple network accesses that directly call RDMA, the poll thread overhead of the CPU is high, which affects the use of execution threads in the CPU, that is, also affects the data processing speed.
发明内容Contents of the invention
本申请实施例提供了一种数据处理的方法、芯片、设备以及系统,用于降低处理时延。Embodiments of the present application provide a data processing method, chip, device, and system for reducing processing delay.
本申请实施例第一方面提供了一种数据处理的方法,该方法应用于数据处理系统,所述数据处理系统包括第一设备和第二设备,该方法包括:第二设备接收第一设备发送的数据处理请求,其中,第二设备包括处理器,处理器包括第一处理器核和第二处理器核,第一处理器核的处理能力大于第二处理器核;第二设备根据数据处理请求携带的信息,确定将数据处理请求调度至第一处理器核处理,或者,确定将数据处理请求调度至第二处理器核处理。The first aspect of the embodiment of the present application provides a data processing method, the method is applied to a data processing system, the data processing system includes a first device and a second device, the method includes: the second device receives the first device sent The data processing request, wherein, the second device includes a processor, the processor includes a first processor core and a second processor core, and the processing capability of the first processor core is greater than that of the second processor core; the second device processes according to the data The information carried in the request determines to schedule the data processing request to the first processor core for processing, or determines to schedule the data processing request to the second processor core for processing.
上述第一方面中,数据处理请求携带的信息可以使得第二设备确定第一设备发送的数据处理请求是在第二设备的第一处理器核还是在第二处理器核执行该数据处理请求,使得可以根据数据处理请求的类型选择合适的处理器核进行数据处理,提高了数据处理的速度。In the first aspect above, the information carried in the data processing request can enable the second device to determine whether the data processing request sent by the first device is executed on the first processor core or the second processor core of the second device, This makes it possible to select a suitable processor core for data processing according to the type of the data processing request, thereby increasing the speed of data processing.
在一个可能的实施方式中,第二处理器核包括轮询线程和调度线程,上述步骤第二设备接收第一设备发送的数据处理请求,具体包括:轮询线程通过轮询从第二设备的接收队列中获取第一设备发送的数据处理请求;轮询线程将数据处理请求发送至调度线程。In a possible implementation manner, the second processor core includes a polling thread and a scheduling thread, and the above steps where the second device receives the data processing request sent by the first device specifically include: the polling thread polls the The data processing request sent by the first device is obtained from the receiving queue; the polling thread sends the data processing request to the scheduling thread.
上述可能的实施方式中,由第二处理器核来执行轮询线程和调度线程,由于第二处理器核的计算能力小,所需的开销相应小,即轮询线程的开销小,在可以执行更多轮询线程的情况下不影响第一处理器核对数据处理请求的处理,减少处理时延。In the above possible implementation manner, the polling thread and the scheduling thread are executed by the second processor core. Since the computing power of the second processor core is small, the required overhead is correspondingly small, that is, the overhead of the polling thread is small. In the case of executing more polling threads, the processing of the first processor to check the data processing request is not affected, thereby reducing the processing delay.
在一个可能的实施方式中,第一处理器核和第二处理器核中包括执行线程;上述步骤第二设备根据数据处理请求携带的信息,确定将数据处理请求调度至第一处理器核处理,或者,确定将数据处理请求调度至第二处理器核处理,具体包括:第二处理器核中的调度线程根据数据处理请求携带的信息,确定将数据处理请求调度至第一处理器核中的执行线程处理,或者,确定将数据处理请求调度至第二处理器核中的执行线程处理。In a possible implementation, the first processor core and the second processor core include execution threads; in the above steps, the second device determines to dispatch the data processing request to the first processor core for processing according to the information carried by the data processing request. , or, determining to schedule the data processing request to the second processor core for processing specifically includes: the scheduling thread in the second processor core determines to schedule the data processing request to the first processor core according to the information carried by the data processing request or, determine to dispatch the data processing request to the execution thread in the second processor core for processing.
在一个可能的实施方式中,数据处理请求为对第二设备中的数据库进行访问的数据库访问请求,数据库访问请求包括以下任意一种:数据写入请求、数据读取请求、数据更新请求、数据删除请求、文件加锁请求、数据检索请求。In a possible implementation, the data processing request is a database access request for accessing the database in the second device, and the database access request includes any of the following: data write request, data read request, data update request, data Delete requests, file lock requests, data retrieval requests.
上述可能的实施方式中,数据处理请求可以是数据写入请求、数据读取请求、数据更新请求、数据删除请求、文件加锁请求、数据检索请求,还可以是远端的资源使用请求等,本申请实施例可以应用于多种场景,提高用户体验。In the above possible implementation manners, the data processing request may be a data write request, a data read request, a data update request, a data deletion request, a file locking request, a data retrieval request, or a remote resource usage request, etc. The embodiments of the present application can be applied to various scenarios to improve user experience.
在一个可能的实施方式中,数据处理请求携带的信息包括功能标识;上述步骤第二设备根据数据处理请求携带的信息,确定将数据处理请求调度至第一处理器核处理,或者,确定将数据处理请求调度至第二处理器核处理,具体包括:第二设备根据功能标识和第二设备中预设的信息,确定将数据处理请求调度至第一处理器核处理,或者,确定将数据处理请求调度至第二处理器核处理。In a possible implementation, the information carried by the data processing request includes a function identifier; in the above steps, the second device determines to schedule the data processing request to the first processor core for processing according to the information carried by the data processing request, or determines to dispatch the data processing request to the first processor core for processing. Scheduling the processing request to the second processor core for processing specifically includes: the second device determines to dispatch the data processing request to the first processor core for processing according to the function identifier and preset information in the second device, or determines to dispatch the data processing request The request is dispatched to the second processor core for processing.
上述可能的实施方式中,第二设备中保存有预设的信息,由预设的信息结合功能标识确定在第一处理器核执行还是在第二处理器核执行该数据读取请求,该预设的信息可以为本申请实施例对所需计算能力低的功能的标识进行注册的注册信息,用于第二设备区分所需计算能力不同的数据读取请求,并相应在第一处理器核或第二处理器核处理,无需对第一设备进行改进,降低成本。In the above possible implementation manner, the second device stores preset information, and it is determined whether to execute the data read request on the first processor core or on the second processor core based on the preset information combined with the function identifier. The set information may be the registration information for registering the identification of the function with low required computing power in the embodiment of the present application, which is used for the second device to distinguish the data reading requests with different required computing power, and correspondingly send the data to the first processor core. Or the second processor core processing, no need to improve the first device, reducing costs.
在一个可能的实施方式中,上述步骤第二设备接收第一设备发送的数据处理请求之前,方法还包括:第二设备对功能标识进行注册,以生成预设的信息。In a possible implementation manner, before the second device receives the data processing request sent by the first device in the above step, the method further includes: the second device registers a function identifier to generate preset information.
上述可能的实施方式中,第二设备对部分功能标识进行注册,该预设的信息则为注册的功能标识,提高方案的可行性。In the above possible implementation manner, the second device registers part of the function identifiers, and the preset information is the registered function identifiers, which improves the feasibility of the solution.
在一个可能的实施方式中,上述步骤轮询线程通过轮询从第二设备的接收队列中获取第一设备发送的数据处理请求包括:轮询线程轮询处理器中的至少一个门铃寄存器,并从与第一门铃寄存器绑定的接收队列中获取数据处理请求。In a possible implementation manner, the polling thread in the above step obtains the data processing request sent by the first device from the receiving queue of the second device by polling, comprising: the polling thread polls at least one doorbell register in the processor, and Get data processing requests from the receive queue bound to the first doorbell register.
上述可能的实施方式中,轮询线程只需要轮询至少一个门铃寄存器,不需要轮询第二设备的大量的接收队列,减少轮询开销。In the above possible implementation manners, the polling thread only needs to poll at least one doorbell register, and does not need to poll a large number of receiving queues of the second device, thereby reducing polling overhead.
在一个可能的实施方式中,上述步骤第二处理器核中的调度线程根据数据处理请求携带的信息,确定将数据处理请求调度至第二处理器核中的执行线程处理之后,方法还包括:当满足预设条件时,通过第二处理器核的调度线程调用第一处理器核的执行线程处理数据处理请求,预设条件指示第二处理器核的执行线程无法执行数据处理请求。In a possible implementation manner, after the scheduling thread in the second processor core in the above step determines to schedule the data processing request to the execution thread in the second processor core according to the information carried by the data processing request, the method further includes: When the preset condition is satisfied, the scheduling thread of the second processor core invokes the execution thread of the first processor core to process the data processing request, and the preset condition indicates that the execution thread of the second processor core cannot execute the data processing request.
上述可能的实施方式中,数据处理请求在第二处理器核的执行线程中无法继续执行时,第二处理器核的调度线程还可以调度第一处理器核的执行线程执行该数据处理请求,相对于由操作系统调度第一处理器核和第二处理器核,可以降低调度延迟。In the above possible implementation manner, when the data processing request cannot continue to be executed in the execution thread of the second processor core, the scheduling thread of the second processor core may also schedule the execution thread of the first processor core to execute the data processing request, Compared with scheduling the first processor core and the second processor core by the operating system, the scheduling delay can be reduced.
本申请实施例第二方面提供了一种处理器芯片,可以实现上述第一方面或第一方面中任一种可能的实施方式中的方法。该处理器芯片包括用于执行上述方法的相应的第一处理器核和第二处理器核。The second aspect of the embodiments of the present application provides a processor chip that can implement the method in the foregoing first aspect or any possible implementation manner of the first aspect. The processor chip includes corresponding first processor cores and second processor cores for executing the above method.
本申请实施例第三方面提供了一种计算设备,包括:处理器,该处理器与存储器耦合,该存储器用于存储指令,当指令被处理器执行时,使得该计算设备实现上述第一方面或第一方面中任一种可能的实施方式中的方法。该计算设备例如可以为网络设备,也可以为支持网络设备实现上述方法的芯片或芯片系统等。The third aspect of the embodiment of the present application provides a computing device, including: a processor, the processor is coupled with a memory, and the memory is used to store instructions, and when the instructions are executed by the processor, the computing device implements the above first aspect Or the method in any possible implementation manner in the first aspect. The computing device may be, for example, a network device, or may be a chip or a chip system that supports the network device to implement the above method.
本申请实施例第四方面提供了一种计算机可读存储介质,该计算机可读存储介质中保存有指令,当该指令被执行时,使得计算机执行前述第一方面或第一方面任一种可能的实施方式提供的方法。The fourth aspect of the embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores instructions, and when the instructions are executed, the computer executes the first aspect or any possibility of the first aspect The method provided by the embodiment.
本申请实施例第五方面提供了一种计算机程序产品,计算机程序产品中包括计算机程序代码,当该计算机程序代码被执行时,使得计算机执行前述第一方面或第一方面任一种可能的实施方式提供的方法。The fifth aspect of the embodiment of the present application provides a computer program product. The computer program product includes computer program code. When the computer program code is executed, the computer executes the first aspect or any possible implementation of the first aspect. method provided.
本申请实施例第六方面提供了一种系统,该系统中包括第一设备和第二设备,其中,第一设备和第二设备通信连接,第一设备用于向第二设备发送数据处理请求,第二设备可以执行前述第一方面或第一方面任一种可能的实施方式提供的方法。The sixth aspect of the embodiment of the present application provides a system, the system includes a first device and a second device, wherein the first device and the second device are connected in communication, and the first device is used to send a data processing request to the second device , the second device may execute the method provided in the foregoing first aspect or any possible implementation manner of the first aspect.
附图说明Description of drawings
图1为本申请实施例提供的系统架构示意图;FIG. 1 is a schematic diagram of a system architecture provided by an embodiment of the present application;
图2为本申请实施例提供的一种数据处理系统的结构示意图;FIG. 2 is a schematic structural diagram of a data processing system provided by an embodiment of the present application;
图3为本申请实施例提供的一种数据处理的方法的示意图;FIG. 3 is a schematic diagram of a data processing method provided in an embodiment of the present application;
图4为本申请实施例提供的一种内部流程图;Fig. 4 is an internal flowchart provided by the embodiment of the present application;
图5为本申请实施例提供的另一种内部流程图;FIG. 5 is another internal flowchart provided by the embodiment of the present application;
图6为本申请实施例提供的处理器芯片的结构示意图;FIG. 6 is a schematic structural diagram of a processor chip provided by an embodiment of the present application;
图7为本申请实施例提供的计算设备的结构示意图。FIG. 7 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
具体实施方式Detailed ways
本申请实施例提供了一种数据处理的方法、芯片、设备以及系统,用于降低处理时延。Embodiments of the present application provide a data processing method, chip, device, and system for reducing processing delay.
下面结合附图,对本申请的实施例进行描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。Embodiments of the present application are described below in conjunction with the accompanying drawings. Apparently, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Those of ordinary skill in the art know that, with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second" and the like in the specification and claims of the present application and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a sequence of steps or elements is not necessarily limited to the expressly listed instead, may include other steps or elements not explicitly listed or inherent to the process, method, product or apparatus.
在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as superior or better than other embodiments.
首先对本申请实施例提供的一种数据处理方法涉及的一些概念做解释说明。First, some concepts involved in a data processing method provided in the embodiment of the present application are explained.
远程直接内存访问(remote direct memory access,RDMA)是一种为了解决网络传输中服务器端数据处理延迟而产生的技术。RDMA将用户应用中的数据直接传入设备(例如:服务器)的存储区,通过网络将数据从一个设备快速传输到远程另一个设备的存储器中,消除了传输过程中在一个设备内部的多次数据复制操作,无需双方设备的操作系统的介入,降低了中央处理器(central processing unit,CPU)的负载。目前RDMA支持多种数据读写方式:发送/接收(Send/Receive),RDMA读取(RDMA read),RDMA写(RDMA write)等。Remote direct memory access (RDMA) is a technology developed to solve server-side data processing delays in network transmission. RDMA transfers the data in the user application directly to the storage area of the device (for example: server), quickly transfers the data from one device to the storage of another remote device through the network, and eliminates the multiple times inside a device during the transmission process. The data copy operation does not require the intervention of the operating systems of the two devices, thereby reducing the load on the central processing unit (CPU). At present, RDMA supports multiple data reading and writing methods: send/receive (Send/Receive), RDMA read (RDMA read), RDMA write (RDMA write), etc.
RDMA提供了基于消息队列的点对点通信,每个设备都可以直接获取其他设备的消息,无需操作系统和协议栈的介入。消息服务建立在通行双方之间创建的信道连接之上。当通行双方之间需要通信时,就会创建一条信道连接,每条信道的首尾端点是两对队列对(queue pairs,QP)。每对QP由发送队列(send queue,SQ)和接收队列(receive queue,RQ)构成。除了QP描述的两种基本队列之外,RDMA还提供一种完成队列(complete queue,CQ),CQ用来记录SQ中信息的发送结果和RQ中信息的接收结果。RDMA provides point-to-point communication based on message queues, and each device can directly obtain messages from other devices without the intervention of the operating system and protocol stack. The message service is built on the channel connection created between the two parties. When communication between the two parties is required, a channel connection will be created, and the first and last endpoints of each channel are two pairs of queue pairs (queue pairs, QP). Each pair of QPs consists of a sending queue (send queue, SQ) and a receiving queue (receive queue, RQ). In addition to the two basic queues described by QP, RDMA also provides a complete queue (CQ), which is used to record the sending results of information in SQ and the receiving results of information in RQ.
图1为本申请实施例适用的一种系统架构示意图,如图1所示,多个设备组成的一个远端数据处理系统,多个设备包括设备0、设备1、设备2、设备3、…、设备N,每个设备上都设置有一个网络接口控制器(network interface controller,NIC),各个设备上的NIC通过网络(network)实现通信。其中,各个设备即可以作为客户端(client)发起远端访问请求,也可以作为服务端(server)接收访问请求,一个设备也可以仅作为client或server。Figure 1 is a schematic diagram of a system architecture applicable to the embodiment of the present application. As shown in Figure 1, a remote data processing system composed of multiple devices, the multiple devices include
对于通过链表/树等结构(键值对(key-value))来实现数据存储和检索的数据,访问数据需要进行多次指针追踪,即多次的主存访问。由于业务请求需要多次的主存访问,通常采用双边远端程序调用(remote procedure call,RPC)实现远端数据访问,从而实现指针追踪和数据访问。For data stored and retrieved through structures such as linked lists/trees (key-value pairs (key-value)), accessing data requires multiple pointer tracking, that is, multiple main memory accesses. Since business requests require multiple main memory accesses, bilateral remote procedure calls (remote procedure call, RPC) are usually used to implement remote data access, thereby implementing pointer tracking and data access.
以双边RPC为例,client通过双边RPC调用server的CPU实现数据的访问,例如可以是Key-Value的访问。即client将请求发送到server,server端通过多个轮询(poll)线程来获取请求,然后调起执行线程来执行请求,并将多个Key-Value获得的数据反馈给client。上述方案虽然避免了多次单边RDMA的多次网络访问,但是CPU需要进行的大量的poll线程开销大,影响执行线程的使用,即影响了处理时延。Taking bilateral RPC as an example, the client calls the CPU of the server through bilateral RPC to access data, such as Key-Value access. That is, the client sends the request to the server, and the server obtains the request through multiple polling (poll) threads, then invokes the execution thread to execute the request, and feeds back the data obtained by multiple Key-Values to the client. Although the above solution avoids multiple network accesses by unilateral RDMA, the large number of poll threads required by the CPU is expensive, which affects the use of execution threads, that is, affects the processing delay.
为解决上述问题,本申请实施例提供了一种数据处理的方法,执行该数据处理的方法的数据处理系统可以参照图2所示,该系统包括进行通信的第一设备和第二设备,其中,第一设备可以包括多个,例如第一设备0,第一设备1,…,第一设备N。第二设备包括通信组件、内存和处理器,其中,处理器包括计算能力高的第一处理器核,以及可以执行多线程的第二处理器核,应理解,第一处理器核可以包括多个具有较高计算能力的处理器核,因此,处理器中的第一处理器核也可以称为第一处理器核簇,同样的,第二处理器核也可以包括多个处理器核,因此,第二处理器核也可以称为第二处理器核簇。第一处理器核的计算能力高于第二处理器核,第一处理器核中包括poll线程和执行线程。第二处理器核中可以包括poll线程、调度线程和执行线程,其中,第二处理器核还包括与poll线程绑定的一个或多个门铃寄存器,以及在为执行线程缓存请求的片上缓冲区(buffer),该片上buffer也可以不用,此处不作限定。内存包括为第一处理器核的执行线程接收请求的第一处理器核队列,为第二处理器核的执行线程接收请求的第二处理器核队列,接收请求(request,REQ)/发送结果(result,RSLT)的队列,例如REQ/RSLT 0,REQ/RSLT 1,…,REQ/RSLT N,通信组件支持REQ/RSLT和第一设备之间REQ/RSLT的传输。In order to solve the above problems, an embodiment of the present application provides a data processing method. The data processing system that executes the data processing method can be shown in FIG. 2. The system includes a first device and a second device for communication, wherein , the first device may include multiple, for example, the
示例性地,上述第二设备中的处理器可以采用大小核处理器,该大小核处理器为一种精简指令集(reduced instruction set computing,RISC)架构下嵌入式系统的核心部件先进的高精指令集处理器(advanced RISC machines,ARM)的大小核结构(big.LITTLE),是一种异质运算多核心处理器架构。在该架构中,将比较耗电、但运算能力强的处理器核心组成的“big集群”(第一处理器核)与低耗电、运算能力弱的处理器核心组成的“LITTLE集群”(第二处理器核)结合在一起,这些处理器核心共享存储器区段,并能够在不同的CPU集群之间在线实时调度、切换负载。Exemplarily, the processor in the above-mentioned second device may adopt a large and small core processor, which is an advanced high-precision core component of an embedded system under a reduced instruction set (reduced instruction set computing, RISC) architecture. The large and small core structure (big.LITTLE) of the instruction set processor (advanced RISC machines, ARM) is a heterogeneous computing multi-core processor architecture. In this architecture, the "big cluster" (the first processor core) composed of processor cores with relatively power consumption but strong computing power and the "LITTLE cluster" (the first processor core) composed of processor cores with low power consumption and weak computing power The second processor cores) are combined, and these processor cores share memory segments, and can schedule and switch loads online and in real time between different CPU clusters.
上述第二设备中的通信组件可以为网卡,高速串行计算机扩展总线标准(peripheral component interconnect express,PCIE)物理链路,大容量通信系统(highcapacity communication system,HCCS)物理链路,或其他的通信组件。The communication component in the above-mentioned second device can be a network card, a high-speed serial computer expansion bus standard (peripheral component interconnect express, PCIE) physical link, a high-capacity communication system (highcapacity communication system, HCCS) physical link, or other communication components.
应理解,上述图2所示的第一设备和第二设备可以是分别独立的服务器,也可以是一个服务器内部的、或者不同服务器之间的相对独立的计算模块。当第一设备和第二设备均为服务器时,上述数据处理系统可以是一个机房、机架、或者是由远程的两个数据中心形成的。It should be understood that the first device and the second device shown in FIG. 2 may be independent servers, or may be relatively independent computing modules within a server or between different servers. When both the first device and the second device are servers, the above-mentioned data processing system may be a computer room, a rack, or be formed by two remote data centers.
如图3所示为本申请实施例提供的一种数据处理的方法的示意图,该方法包括:As shown in Figure 3, it is a schematic diagram of a data processing method provided by the embodiment of the present application, the method includes:
301.第一设备向第二设备发起数据处理请求,相应的,第二设备接收来自第一设备的数据处理请求。301. The first device initiates a data processing request to the second device, and correspondingly, the second device receives the data processing request from the first device.
本申请实施例中,第一设备可以根据需要调用的第二设备的功能生成数据处理请求,并将该数据处理请求发送给第二设备,第二设备可以根据该数据处理请求执行相应的功能。例如:该数据处理请求可以为对第二设备中的数据库进行访问的数据库访问请求,该数据库访问请求包括以下方式的任意一种:数据写入请求、数据读取请求、数据更新请求、数据删除请求、文件加锁请求、数据检索请求。In the embodiment of the present application, the first device may generate a data processing request according to the function of the second device that needs to be called, and send the data processing request to the second device, and the second device may perform corresponding functions according to the data processing request. For example: the data processing request may be a database access request for accessing the database in the second device, and the database access request includes any of the following methods: data write request, data read request, data update request, data deletion request, file lock request, data retrieval request.
本申请实施例以数据读取请求为例,该数据读取请求为RDMA请求,此时,图2中第一设备和第二设备的REQ/RSLT可以是RQ/SQ,第一设备和第二设备的通信组件为NIC。第一设备通过第一设备的SQ向第二设备发送数据读取请求,该数据读取请求中包括第一设备所要在第二设备中读取的目标数据的键,其中,数据读取请求中还可以包括第一标识或第二标识,其中第一标识指示在第一处理器核执行数据读取请求,第二标识指示在第二处理器核执行该数据读取请求,或者,当该数据读取请求中没有第一标识时,第二设备可以根据数据读取请求确定在在第二处理器核执行该数据读取请求,或者,当该数据读取请求中没有第二标识时,第二设备可以根据数据读取请求确定在第一处理器核执行该数据读取请求,此处不作限定。The embodiment of the present application takes a data read request as an example, and the data read request is an RDMA request. At this time, the REQ/RSLT of the first device and the second device in FIG. 2 may be RQ/SQ, and the first device and the second device The communication component of the device is the NIC. The first device sends a data read request to the second device through the SQ of the first device, and the data read request includes the key of the target data to be read by the first device in the second device, wherein the data read request It may also include a first identifier or a second identifier, wherein the first identifier indicates that the data read request is executed on the first processor core, and the second identifier indicates that the data read request is executed on the second processor core, or, when the data When there is no first identifier in the read request, the second device may determine according to the data read request that the data read request is being executed on the second processor core, or, when there is no second identifier in the data read request, the first The second device may determine to execute the data reading request on the first processor core according to the data reading request, which is not limited herein.
本申请实施例以数据读取请求中包括第一标识或第二标识,其中第一标识指示在第一处理器核执行数据读取请求,第二标识指示在第二处理器核执行该数据读取请求为例,示例性的,第一标识和第二标识还可以是对RDMA请求原本包括的功能标识(functionID)进行处理后的标识,该功能标识为第一设备希望在第二设备调用的功能的标识,其中,第一标识对应的功能标识所调用的功能所需计算能力比第二标识对应的功能标识所调用的功能大。第二设备中保存有预设的信息,由预设的信息结合功能标识确定在第一处理器核执行还是在第二处理器核执行该数据读取请求,该预设的信息可以为本申请实施例对所需计算能力低的功能的标识进行注册的注册信息,相应的,预设的信息中只包括第一标识,该预设的信息用于第二设备区分所需计算能力不同的数据读取请求,当数据读取请求中的功能标识包含于预设的信息中时,第二设备在第二处理器核执行该数据读取请求,当数据读取请求中的功能标识不包含于预设的信息中时,第二设备在第一处理器核执行该数据读取请求。其中,该注册过程可以是由第二设备进行,并将注册的功能标识通知给第一设备,也可以是由第一设备注册,再将注册的功能标识通知给第二设备,还可以是由第三方设备注册,并通知给第一设备和第二设备,或者第一设备不需要感知该注册过程和注册结果,本申请实施例对此不作限定。In this embodiment of the present application, the data read request includes a first identifier or a second identifier, wherein the first identifier indicates that the data read request is executed on the first processor core, and the second identifier indicates that the data read request is executed on the second processor core. Taking the request as an example, for example, the first identifier and the second identifier may also be identifiers after processing the function identifier (functionID) originally included in the RDMA request. An identifier of a function, wherein the function called by the function identifier corresponding to the first identifier requires more computing power than the function called by the function identifier corresponding to the second identifier. Preset information is stored in the second device, and it is determined whether to execute the data read request on the first processor core or on the second processor core by combining the preset information with the function identifier. The preset information can be used for this application The embodiment registers the registration information for the identifiers of functions with low required computing capabilities. Correspondingly, only the first identifier is included in the preset information, and the preset information is used for the second device to distinguish data with different required computing capabilities. A read request. When the function identifier in the data read request is included in the preset information, the second device executes the data read request on the second processor core. When the function identifier in the data read request is not included in the When the preset information is available, the second device executes the data read request on the first processor core. Wherein, the registration process may be performed by the second device, which notifies the first device of the registered function identifier, or may be registered by the first device, and then notified of the registered function identifier to the second device, or by The third-party device registers and notifies the first device and the second device, or the first device does not need to perceive the registration process and registration result, which is not limited in this embodiment of the present application.
目标数据以上述链表/树等结构的数据为例,即该目标数据需要在第二设备的内存中的位置需要通过多次指针指示。对于不同数据量的目标数据在大小核处理器中所需的计算能力不同,所需计算能力低的目标数据可以分配给大小核处理器的第二处理器核处理,所需计算能力高的目标数据可以分配给大小核处理器的第一处理器核处理。As for the target data, the above linked list/tree structure data is taken as an example, that is, the location of the target data in the memory of the second device needs to be indicated by multiple pointers. For the target data with different data volumes, the computing power required in the large and small core processors is different. The target data with low computing power can be allocated to the second processor core of the big and small core processors for processing, and the target with high computing power Data may be allocated to the first processor core of the large and small core processors for processing.
该数据读取请求中还可以包括目标数据的读取地址或写入地址,此处不作限定。第一设备的用户进程可以发起用于获取目标数据的获取请求,用户进程通过触发第一设备的NIC的门铃寄存器,使得第一设备的NIC获得该获取请求,并在该获取请求中携带上述功能标识构成数据读取请求,然后通过SQ向第二设备发送该数据读取请求。本申请实施例中,用户进程还可以通过中间件触发门铃寄存器,该中间件可以是统一通信交换(unifiedcommunications exchange,UCX),用于通过提供高级应用程序接口(applicationprogramming interface,API)促进快速开发,屏蔽低层细节,同时保持高性能和可伸缩性。The data read request may also include a read address or a write address of the target data, which is not limited here. The user process of the first device can initiate an acquisition request for acquiring target data, and the user process triggers the doorbell register of the NIC of the first device, so that the NIC of the first device obtains the acquisition request, and carries the above-mentioned functions in the acquisition request The identification constitutes a data read request, and then the data read request is sent to the second device through SQ. In the embodiment of the present application, the user process can also trigger the doorbell register through middleware, and the middleware can be a unified communications exchange (unified communications exchange, UCX), which is used to promote rapid development by providing an advanced application programming interface (application programming interface, API), Mask low-level details while maintaining high performance and scalability.
本申请实施例中,由第二设备的NIC通过RQ接收该数据读取请求,在步骤301之后,第二设备的NIC根据QP与门铃寄存器的映射关系,敲门铃到大小核处理器的第二处理器核中的门铃寄存器,该映射关系指示第二设备中的每对QP绑定一个门铃寄存器。第二处理器核中还包括多个poll线程,门铃寄存器个数与所需poll线程数量相关,可以与poll线程个数相同,也可以每个poll线程对应多个门铃寄存器,每个poll线程只需要轮询其绑定的门铃寄存器,当该门铃寄存器被触发时,poll线程即可向该门铃寄存器绑定的QP中获取上述数据读取请求。In the embodiment of the present application, the NIC of the second device receives the data read request through RQ, and after step 301, the NIC of the second device rings the doorbell to the first register of the large and small core processors according to the mapping relationship between the QP and the doorbell register. The doorbell register in the second processor core, the mapping relationship indicates that each pair of QPs in the second device is bound to a doorbell register. The second processor core also includes a plurality of poll threads, and the number of doorbell registers is related to the number of required poll threads, which can be the same as the number of poll threads, or each poll thread can correspond to a plurality of doorbell registers, and each poll thread only It is necessary to poll the doorbell register bound to it. When the doorbell register is triggered, the poll thread can obtain the above data read request from the QP bound to the doorbell register.
302.第二设备根据数据处理请求携带的信息,确定将数据处理请求调度至第一处理器核处理,或者,确定将数据处理请求调度至第二处理器核处理。302. The second device determines to schedule the data processing request to the first processor core for processing according to the information carried in the data processing request, or determines to schedule the data processing request to the second processor core for processing.
数据处理请求携带的信息可以指示第二设备在第一处理器核处理,或者指示第二设备在第二处理器核处理,第二设备收到上述数据处理请求后,当该信息指示第二设备在第一处理器核处理时,第二设备可以确定调度第一处理器核处理该数据处理请求,当该信息指示第二设备在第二处理器核处理时,第二设备可以确定调度第二处理器核处理该数据处理请求。The information carried in the data processing request may indicate that the second device is processed by the first processor core, or that the second device is processed by the second processor core. After the second device receives the above data processing request, when the information indicates that the second device When processing by the first processor core, the second device may determine to schedule the first processor core to process the data processing request, and when the information indicates that the second device is processing by the second processor core, the second device may determine to schedule the second The processor core handles the data processing request.
以对第二设备的数据库的进行访问的数据读取请求为例,当第二设备在注册的功能标识中匹配到该数据处理请求中包含的功能标识时,第二设备可以直接通过第二处理器核执行该数据读取请求,基于目标数据的键,通过多次指针追踪该目标数据的位置,以读取该目标数据。当第二设备在注册的功能标识中匹配不到该数据处理请求中包含的功能标识时,第二设备可以通过第一处理器核执行该数据读取请求。Taking the data reading request for accessing the database of the second device as an example, when the second device matches the function identifier contained in the data processing request in the registered function identifier, the second device can directly pass the second processing The core executes the data read request, and based on the key of the target data, tracks the position of the target data through multiple pointers to read the target data. When the registered function identifiers of the second device cannot match the function identifier included in the data processing request, the second device may execute the data read request through the first processor core.
第二处理器核的主要功能可以分为poll线程、调度线程和执行线程三部分,poll线程将数据读取请求发给调度线程,调度线程接收到数据读取请求后,可以匹配数据读取请求中的功能标识与注册的功能标识,匹配成功时,则可以确认第二处理器核的片上缓冲区(buffer)的空满情况,当片上缓存已满时,则将数据读取请求放置在第二处理器核队列中,当片上缓冲区未满时,则可以在第二处理器核队列为空的情况下直接将读取命令写入片上缓冲区。执行线程可以从该片上缓冲区获取数据读取请求,并根据该数据读取请求中目标数据的键,执行线程可以通过多次指针追踪该目标数据的读取位置。The main functions of the second processor core can be divided into three parts: poll thread, scheduling thread and execution thread. The poll thread sends the data reading request to the scheduling thread, and the scheduling thread can match the data reading request after receiving the data reading request. When the function identifier in the function identifier and the registered function identifier match successfully, it can be confirmed that the on-chip buffer (buffer) of the second processor core is empty and full. When the on-chip buffer is full, the data read request is placed in the first In the second processor core queue, when the on-chip buffer is not full, the read command can be directly written into the on-chip buffer when the second processor core queue is empty. The execution thread can obtain a data read request from the on-chip buffer, and according to the key of the target data in the data read request, the execution thread can track the read position of the target data through multiple pointers.
上述功能标识匹配成功时的执行过程可以参阅图4所示的一种内部流程图,第一设备包括用户线程和NIC,第二设备包括NIC、第二处理器核和内存,其中,第二处理器核包括poll线程、调度线程和执行线程,内存存储有树状结构数据,该树状结构数据包括根节点、中间节点和叶子节点。第一设备的用户线程触发第一设备的NIC的门铃寄存器,使得第一设备的NIC将数据读取请求发送给第二设备的NIC,第二设备的NIC将数据读取请求填入RQ,并触发poll线程的门铃寄存器,使得poll线程将RQ中的数据读取请求转发给调度线程,调度线程根据该数据读取请求调度执行线程,执行线程基于目标数据的键进行指针追踪,通过多次访问内存以获得根节点指向的中间节点,以及中间节点指向的叶子节点,并根据中间节点指向的叶子节点中,找到包括目标数据的键的节点,执行线程即可获得该目标数据,并通过触发第二设备的NIC的门铃寄存器,使得第二设备的NIC将目标数据通过第一设备和第二设备将的通信传输给第一设备的NIC,第一设备的NIC通过RQ接收该目标数据,并将接收结果存储在CQ,第一设备的用户线程可以通过轮询第一设备的CQ,当CQ指示接收成功时,用户线程可以从RQ中获得该目标数据。The execution process when the above-mentioned function identification is successfully matched can refer to an internal flow chart shown in Figure 4. The first device includes user threads and NIC, and the second device includes NIC, second processor core and memory. The core includes a poll thread, a scheduling thread, and an execution thread, and the memory stores data in a tree structure, and the tree structure data includes a root node, an intermediate node, and a leaf node. The user thread of the first device triggers the doorbell register of the NIC of the first device, so that the NIC of the first device sends a data read request to the NIC of the second device, and the NIC of the second device fills the RQ with the data read request, and Trigger the doorbell register of the poll thread, so that the poll thread forwards the data reading request in RQ to the scheduling thread, and the scheduling thread schedules the execution thread according to the data reading request. The execution thread performs pointer tracking based on the key of the target data, and through multiple accesses memory to obtain the intermediate node pointed to by the root node and the leaf node pointed to by the intermediate node, and find the node containing the key of the target data according to the leaf node pointed to by the intermediate node, the execution thread can obtain the target data, and trigger the first The doorbell register of the NIC of the second device, so that the NIC of the second device transmits the target data to the NIC of the first device through the communication between the first device and the second device, and the NIC of the first device receives the target data through RQ, and sends The receiving result is stored in CQ. The user thread of the first device can poll the CQ of the first device. When the CQ indicates that the receiving is successful, the user thread can obtain the target data from the RQ.
本申请实施例中,第二处理器核可以有多种实现形式,例如面积优化的先进的ARM/第五代精简指令集计算机(5th reduced instruction set computer,RISC-V),各个线程功能完备,可以实现poll/调度/执行等功能,且各个线程间通过微体系架构实现硬件级别的线程间切换,例如通过硬件的仲裁实现L/S接口/算术逻辑单元(arithmetic andlogic unit,ALU)等共享资源的调度使用;高性能第一处理器核可以有多种实现形式,例如高性能的x86/ARM/RISC-V,多线程第二处理器核的实现位置与需求相关,可以与高性能大核在一个CPU小方块(die)上,也可以不在同一个die上。In the embodiment of the present application, the second processor core may have various implementation forms, such as an area-optimized advanced ARM/fifth generation reduced instruction set computer (5th reduced instruction set computer, RISC-V), each thread has complete functions, Poll/scheduling/execution and other functions can be realized, and hardware-level inter-thread switching can be realized through the micro-architecture between each thread, such as sharing resources such as L/S interface/arithmetic and logic unit (ALU) through hardware arbitration Scheduling and use; the high-performance first processor core can have multiple implementation forms, such as high-performance x86/ARM/RISC-V, the implementation position of the multi-threaded second processor core is related to the demand, and can be combined with the high-performance large core On a CPU small square (die), or not on the same die.
当数据读取请求中的功能标识与注册的功能标识匹配失败时,第二处理器核的调度线程即可调度第一处理器核处理该数据读取请求,避免了第二处理器核处理性能较差影响大粒度RPC的执行性能。具体的,第二处理器核的调度线程确定由第一处理器核处理该数据读取请求后,即可将该数据读取请求转发到第一处理器核队列中,并记录第一处理器核队列的使用标志,调度线程可以记录空/非空标志,由非空标志表示指示代表第一处理器核队列已使用的使用标志,即第二处理器核的调度线程可以维护所有第一处理器核队列的空/非空标志,第一处理器核只需要poll第二处理器核的调度线程中的空/非空标志,无需对所有命令队列进行轮询,占用的poll线程数量相比直接poll大幅降低。当第一处理器核poll到该非空标志时,可以从命令队列中获取该数据读取请求,并通过第一处理器核的执行线程执行该数据读取请求,基于目标数据的键,查询获得目标数据。When the function identifier in the data read request fails to match the registered function identifier, the scheduling thread of the second processor core can schedule the first processor core to process the data read request, avoiding the processing performance of the second processor core. Poorly affects the execution performance of large-grained RPC. Specifically, after the scheduling thread of the second processor core determines that the data read request is processed by the first processor core, the data read request can be forwarded to the queue of the first processor core, and the first processor core The use flag of the core queue, the scheduling thread can record the empty/non-empty flag, and the non-empty flag indicates the used flag of the first processor core queue, that is, the scheduling thread of the second processor core can maintain all the first processing The empty/non-empty flag of the processor core queue, the first processor core only needs to poll the empty/non-empty flag in the scheduling thread of the second processor core, and there is no need to poll all command queues, compared to the number of poll threads occupied Direct poll is greatly reduced. When the first processor core polls the non-empty flag, it can obtain the data read request from the command queue, and execute the data read request through the execution thread of the first processor core, based on the key of the target data, query Get target data.
本申请实施例中,在功能标识匹配成功的场景下,当第二处理器核在执行该数据处理请求的过程中发现该数据处理请求达到预设条件,该预设条件可以是所需要调用的服务/功能在第二处理器核上不支持或支持处理但性能较差,或者是第二处理器核不支持该数据处理请求所需的算力,例如插入操作导致的B+树分裂操作,第二处理器核支持插入功能,但是插入功能导致的树分裂操作没有足够的算力支持。此时,第二处理器核可以调用第一处理器核的执行线程来执行相应操作。In the embodiment of the present application, in the scenario where the function identifier matches successfully, when the second processor core finds that the data processing request meets a preset condition during the process of executing the data processing request, the preset condition may be the one that needs to be called. The service/function does not support or supports processing on the second processor core but the performance is poor, or the second processor core does not support the computing power required for the data processing request, such as the B+ tree split operation caused by the insertion operation. The two processor cores support the insertion function, but the tree splitting operation caused by the insertion function does not have enough computing power to support it. At this time, the second processor core may call the execution thread of the first processor core to perform corresponding operations.
请参阅图5所示的另一种内部流程图,以该数据处理请求为数据写入请求为例,第一设备包括用户线程和NIC,第二设备包括NIC、第二处理器核、第一处理器核和内存,其中,第二处理器核包括poll线程、调度线程和执行线程,第一处理器核包括poll线程和执行线程。第一设备的用户线程发起数据写入请求,通过触发第一设备的NIC的门铃寄存器,使得第一设备的NIC将该数据写入请求通过第一设备和第二设备间的通信发送给第二设备。第二设备的NIC通过RQ接收该数据写入请求,并触发与该RQ绑定的门铃寄存器,相应的,第二处理器核中与该门铃寄存器绑定的poll线程可以从RQ中获得该数据写入请求。该数据写入请求中包括指示在第二处理器核处理的功能标识,第二处理器核的调度线程调度第二处理器核的执行线程执行该数据写入请求。执行线程通过多次指针追踪内存中目标数据的插入位置,当确定内存中的节点可以写入所有目标数据时,可以直接在内存中插入目标数据,然后发送请求完成响应给第二设备的NIC,第二设备的NIC与第一设备的NIC通信转发该请求完成响应,第一设备的用户线程可以轮询第一设备的NIC获得该请求完成响应,以确定数据写入完成。当确定内存中的节点无法写入所有目标数据,需要将目标数据分裂到其他节点写入,即写入数据会导致树的分裂操作时,第二处理器核的调度线程可以调度第一处理器核执行该数据写入请求,示例性的,第二处理器核可以将该数据写入请求和当前需要插入的位置重新组命令,返回到调度线程中,调度线程将重新组包的请求填充到大核簇的队列中,等待大核调用该请求并将记录使用标志,第二处理器核的poll线程poll到该使用标志后,即可在第一处理器核队列中获取数据写入请求和当前需要插入的位置重新组命令,并转发给第一处理器核的执行线程执行,第一处理器核的执行线程可以根据该数据写入请求和当前需要插入的位置重组命令执行树分裂操作,然后将请求完成响应反馈通过第二设备的NIC和第一设备的NIC反馈给第一设备的用户线程。Please refer to another internal flow chart shown in FIG. 5. Taking the data processing request as a data write request as an example, the first device includes a user thread and a NIC, and the second device includes a NIC, a second processor core, a first A processor core and a memory, wherein the second processor core includes a poll thread, a scheduling thread, and an execution thread, and the first processor core includes a poll thread and an execution thread. The user thread of the first device initiates a data write request, and by triggering the doorbell register of the NIC of the first device, the NIC of the first device sends the data write request to the second device through the communication between the first device and the second device. equipment. The NIC of the second device receives the data write request through the RQ and triggers the doorbell register bound to the RQ. Correspondingly, the poll thread bound to the doorbell register in the second processor core can obtain the data from the RQ write request. The data write request includes a function identifier indicating processing on the second processor core, and the scheduling thread of the second processor core schedules the execution thread of the second processor core to execute the data write request. The execution thread traces the insertion position of the target data in the memory through multiple pointers. When it is determined that the nodes in the memory can write all the target data, it can directly insert the target data in the memory, and then send a request completion response to the NIC of the second device. The NIC of the second device communicates with the NIC of the first device to forward the request completion response, and the user thread of the first device may poll the NIC of the first device to obtain the request completion response, so as to determine that data writing is completed. When it is determined that the nodes in the memory cannot write all the target data and need to split the target data to other nodes for writing, that is, when writing data will cause the split operation of the tree, the scheduling thread of the second processor core can schedule the first processor The core executes the data writing request. Exemplarily, the second processor core can write the data writing request and the current location reorganization command that needs to be inserted, and return to the scheduling thread, and the scheduling thread fills the request of reorganizing the package into In the queue of the large core cluster, wait for the large core to call the request and record the usage flag. After the poll thread of the second processor core polls the usage flag, it can obtain the data write request and The current position to be inserted reorganizes the command and forwards it to the execution thread of the first processor core for execution. The execution thread of the first processor core can perform tree splitting according to the data write request and the current position to be inserted. Then, the request completion response is fed back to the user thread of the first device through the NIC of the second device and the NIC of the first device.
本申请实施例中,第一设备向第二设备发送数据处理请求,数据处理请求携带的信息可以指示第二设备确定是在第一处理器核还是在第二处理器核执行该数据处理请求,当该信息指示在第一处理器核执行时,第二设备可以调度第一处理器核执行该数据处理请求,当该信息指示在第二处理器核执行时,第二设备可以调度第二处理器核执行该数据处理请求。其中,在第二设备处理时所需计算能力小的请求由于第二处理器核的开销小,可以支撑更多的并发处理数据,同时降低了处理延迟,而所需计算能力大的请求由于第一处理器核的高处理能力可以保证处理效果。In this embodiment of the application, the first device sends a data processing request to the second device, and the information carried in the data processing request may instruct the second device to determine whether to execute the data processing request on the first processor core or the second processor core, When the information indicates execution on the first processor core, the second device may schedule the first processor core to execute the data processing request, and when the information indicates execution on the second processor core, the second device may schedule the second processing request The processor core executes the data processing request. Wherein, when the second device processes requests with small computing power, due to the small overhead of the second processor core, it can support more concurrent data processing and reduce the processing delay, while requests with large computing power are due to the second processor core. The high processing capability of a processor core can guarantee the processing effect.
上面讲述了数据处理的方法,下面对执行该数据处理的方法的处理器芯片进行描述。The data processing method has been described above, and the processor chip that executes the data processing method will be described below.
请参阅图6,如图6所示为本申请实施例提供的处理器芯片的结构示意图,该处理器芯片60包括:第一处理器核601和第二处理器核602,第一处理器核601的处理能力大于第二处理器核602;Please refer to FIG. 6, as shown in FIG. 6, it is a schematic structural diagram of a processor chip provided by the embodiment of the present application. The
第二处理器核602用于接收源自第一设备的数据处理请求,其中,第一设备为与处理器芯片所在的第二设备通信连接的设备;The
第二处理器核602还用于根据数据处理请求携带的信息,确定将数据处理请求调度至第一处理器核601处理,或者,确定将数据处理请求调度至第二处理器核602处理。The
可选的,第二处理器核602包括轮询线程和调度线程,第二处理器核602具体用于:Optionally, the
利用轮询线程进行轮询,从第二设备的接收队列中获取第一设备发送的数据处理请求;polling by using the polling thread, and obtaining the data processing request sent by the first device from the receiving queue of the second device;
利用轮询线程将数据处理请求发送至调度线程。Utilize the polling thread to send the data processing request to the scheduling thread.
可选的,第一处理器核601和第二处理器核602中包括执行线程,第二处理器核602具体用于:利用调度线程根据数据处理请求携带的信息,确定将数据处理请求调度至第一处理器核601中的执行线程处理,或者,确定将数据处理请求调度至第二处理器核602中的执行线程处理。Optionally, the
可选的,数据处理请求为对第二设备中的数据库进行访问的数据库访问请求,数据库访问请求包括以下任意一种:数据写入请求、数据读取请求、数据更新请求、数据删除请求、文件加锁请求、数据检索请求。Optionally, the data processing request is a database access request for accessing the database in the second device, and the database access request includes any of the following: data write request, data read request, data update request, data delete request, file Lock request, data retrieval request.
可选的,数据处理请求携带的信息包括功能标识;Optionally, the information carried in the data processing request includes a function identifier;
第二处理器核602具体用于根据功能标识和第二设备中预设的信息,确定将数据处理请求调度至第一处理器核601处理,或者,确定将数据处理请求调度至第二处理器核602处理。The
可选的,处理器芯片还包括至少一个门铃寄存器603:第二处理器核602用于利用轮询线程轮询至少一个门铃寄存器,并从与第一门铃寄存器绑定的接收队列中获取数据处理请求。Optionally, the processor chip further includes at least one doorbell register 603: the
可选的,第二处理器核602还用于:Optionally, the
在利用调度线程根据数据处理请求携带的信息,确定将数据处理请求调度至第二处理器核602中的执行线程之后,当满足预设条件时,利用调度线程调用第一处理器核601的执行线程处理数据处理请求,预设条件指示第二处理器核602的执行线程无法执行数据处理请求。After using the scheduling thread to determine to schedule the data processing request to the execution thread in the
图7所示,为本申请的实施例提供的计算设备70的一种可能的逻辑结构示意图。计算设备70包括:处理器701、通信接口702、存储系统703以及总线704。处理器701、通信接口702以及存储系统703通过总线704相互连接。在本申请的实施例中,处理器701用于对计算设备70的动作进行控制管理,例如,处理器701用于执行图3的方法实施例中第二设备所执行的步骤。通信接口702用于支持计算设备70进行通信。存储系统703,用于存储计算设备70的程序代码和数据。FIG. 7 is a schematic diagram of a possible logical structure of a
其中,处理器701可以是中央处理器单元,通用处理器,数字信号处理器,专用集成电路,现场可编程门阵列或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器701也可以是实现确定功能的组合,例如包含一个或多个微处理器组合,数字信号处理器和微处理器的组合等等。总线704可以是外设部件互连标准(peripheralcomponent interconnect,PCI)总线或扩展工业标准结构(extended industry standardarchitecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图7中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。Wherein, the
处理器芯片60中的第一处理器核601、第二处理器核602和门铃寄存器603相当于计算设备70中的处理器701中的部件。The
本实施例的计算设备70可对应于上述图3方法实施例中的第二设备,该计算设备70中的通信接口702可以实现上述图3方法实施例中的第二设备所具有的功能和/或所实施的各种步骤,为了简洁,在此不再赘述。The
在本申请的另一实施例中,还提供一种确定机可读存储介质,确定机可读存储介质中存储有确定机执行指令,当设备的处理器执行该确定机执行指令时,设备执行上述图3方法实施例中的第二设备所执行的数据处理的方法的步骤。In another embodiment of the present application, a deterministic machine-readable storage medium is also provided, and a deterministic machine-executable instruction is stored in the deterministic machine-readable storage medium. When the processor of the device executes the deterministic machine-executable instruction, the device executes Steps in the data processing method performed by the second device in the above method embodiment in FIG. 3 .
在本申请的另一实施例中,还提供一种确定机程序产品,该确定机程序产品包括确定机执行指令,该确定机执行指令存储在确定机可读存储介质中;当设备的处理器执行该确定机执行指令时,设备执行上述图3方法实施例中的第二设备所执行的数据处理的方法的步骤。In another embodiment of the present application, a computer program product is also provided, the computer program product includes computer-executable instructions, and the computer-executable instructions are stored in a computer-readable storage medium; when the processor of the device When the determining machine executes the instruction, the device executes the steps of the data processing method performed by the second device in the method embodiment in FIG. 3 above.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described system, device and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device and method can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,read-onlymemory)、随机存取存储器(RAM,random access memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions for enabling a computing device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, read-only memory), random access memory (RAM, random access memory), magnetic disk or optical disk, and other media that can store program codes.
Claims (17)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111152703.2A CN115878550A (en) | 2021-09-29 | 2021-09-29 | A data processing method, chip, device and system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111152703.2A CN115878550A (en) | 2021-09-29 | 2021-09-29 | A data processing method, chip, device and system |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN115878550A true CN115878550A (en) | 2023-03-31 |
Family
ID=85756230
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202111152703.2A Pending CN115878550A (en) | 2021-09-29 | 2021-09-29 | A data processing method, chip, device and system |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN115878550A (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118550736A (en) * | 2024-07-30 | 2024-08-27 | 鹏钛存储技术(南京)有限公司 | A communication method among multiple CPUs |
-
2021
- 2021-09-29 CN CN202111152703.2A patent/CN115878550A/en active Pending
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118550736A (en) * | 2024-07-30 | 2024-08-27 | 鹏钛存储技术(南京)有限公司 | A communication method among multiple CPUs |
| CN118550736B (en) * | 2024-07-30 | 2024-10-15 | 鹏钛存储技术(南京)有限公司 | Communication method among multiple CPUs |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US8131814B1 (en) | Dynamic pinning remote direct memory access | |
| EP2898655B1 (en) | System and method for small batching processing of usage requests | |
| CN104216862B (en) | Communication method and device between user process and system service | |
| US9244881B2 (en) | Facilitating, at least in part, by circuitry, accessing of at least one controller command interface | |
| EP3529706A1 (en) | Gpu remote communication with triggered operations | |
| CN110119304B (en) | Interrupt processing method, device and server | |
| CN111404931A (en) | A method of remote data transmission based on persistent memory | |
| CN107025142A (en) | A kind of cross-thread message delivery method, device and electronic equipment | |
| CN112685148A (en) | Asynchronous communication method and device of mass terminals, computer equipment and storage medium | |
| WO2024037629A1 (en) | Data integration method and apparatus for blockchain, and computer device and storage medium | |
| US11231964B2 (en) | Computing device shared resource lock allocation | |
| CN109032818B (en) | Method for synchronization and communication between cores of homogeneous system | |
| CN110543351B (en) | Data processing method and computer device | |
| CN118860290A (en) | NVMe write data processing method, terminal and storage medium | |
| CN113407357B (en) | Method and device for moving data between processes | |
| CN117407356A (en) | Inter-core communication method and device based on shared memory, storage medium and terminal | |
| WO2023104194A1 (en) | Service processing method and apparatus | |
| CN120196573A (en) | RDMA data transmission method, network device, system, and electronic device | |
| CN112527518A (en) | System and apparatus for message tunneling | |
| CN115878550A (en) | A data processing method, chip, device and system | |
| CN118708542A (en) | File system acceleration method, device, equipment, storage medium and program product | |
| CN116225742B (en) | A message distribution method, device and storage medium | |
| WO2024156239A1 (en) | Video streaming transmission method and apparatus, electronic device, and storage medium | |
| CN113691466A (en) | Data transmission method, intelligent network card, computing device and storage medium | |
| CN111209263A (en) | Data storage method, device, equipment and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |