+

CN111459418B - An RDMA-based key-value storage system transmission method - Google Patents

An RDMA-based key-value storage system transmission method Download PDF

Info

Publication number
CN111459418B
CN111459418B CN202010413800.1A CN202010413800A CN111459418B CN 111459418 B CN111459418 B CN 111459418B CN 202010413800 A CN202010413800 A CN 202010413800A CN 111459418 B CN111459418 B CN 111459418B
Authority
CN
China
Prior art keywords
key
data
rdma
client
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010413800.1A
Other languages
Chinese (zh)
Other versions
CN111459418A (en
Inventor
蒋源
施凌鹏
唐斌
叶保留
陆桑璐
卢士达
张露维
胡钧毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
State Grid Shanghai Electric Power Co Ltd
Original Assignee
Nanjing University
State Grid Shanghai Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University, State Grid Shanghai Electric Power Co Ltd filed Critical Nanjing University
Priority to CN202010413800.1A priority Critical patent/CN111459418B/en
Publication of CN111459418A publication Critical patent/CN111459418A/en
Application granted granted Critical
Publication of CN111459418B publication Critical patent/CN111459418B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0613Improving I/O performance in relation to throughput
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Computer And Data Communications (AREA)

Abstract

本发明公开了一种基于RDMA的键值存储系统传输方法。所述方法通过将内存键值存储系统与高性能计算硬件RDMA的结合,采用最快单边语义RDMA write操作,重新设计键值存储系统中的GET与PUT指令,将指令与数据通过write命令进行封装,仅需一次往返传输,完成数据访问,避免原有read操作带来的多次往返传输时延,返回阶段改由旁路客户端内核,提升客户端用户上层应用体验。同时针对原有因为单边操作旁路远端内核,带来的读写竞争无法由远端CPU统一调度的问题,在服务器端RDMA内存中引入优先队列,将多客户端并行指令转成有优先级的串行处理,解决采用单边write命令旁路内核带来的无法由服务器端处理读写竞争的问题。

Figure 202010413800

The invention discloses an RDMA-based key-value storage system transmission method. The method combines the memory key-value storage system with high-performance computing hardware RDMA, adopts the fastest unilateral semantic RDMA write operation, redesigns the GET and PUT instructions in the key-value storage system, and executes the instruction and data through the write command. Encapsulation, only one round-trip transmission is required to complete data access, avoiding multiple round-trip transmission delays caused by the original read operation, and bypassing the client kernel in the return phase to improve the upper-layer application experience of client users. At the same time, in view of the original problem that the read and write competition cannot be uniformly scheduled by the remote CPU due to the unilateral operation bypassing the remote core, a priority queue is introduced into the server-side RDMA memory to convert the multi-client parallel instructions into priority queues. The high-level serial processing solves the problem that the server side cannot handle the read and write competition caused by bypassing the kernel by using the unilateral write command.

Figure 202010413800

Description

RDMA (remote direct memory Access) -based key value storage system transmission method
Technical Field
The invention belongs to the technical field of computer storage, and particularly relates to a transmission method of a RDMA-based key value storage system.
Background
With the maturity of cloud computing and big data processing technologies, the amount of data generated by internet applications is gradually exponentially increased. Meanwhile, with the rise of pictures and short videos, the data have the characteristics of various formats, different sizes, no structuralization and the like; in order to perform query analysis and permanent storage on the growing mass data, a higher-performance storage technology is required. And the traditional relational database has low concurrent processing capability, poor expansibility and fixed storage structure, and is difficult to be suitable for the storage requirements required by a novel unstructured data mode with dispersed formats.
For this reason, the key-value storage system of non-relational (NoSQL) storage is beginning to be widely used as a mainstream storage and analysis tool in the industry. The memory key value storage system is widely applied due to the advantages of high access speed, strong expandability and the like, and is used for accelerating various data processing workloads, including online analysis workloads and offline data intensive workloads. For example, it can be used as a main storage (e.g., Redis and RAMcloud) tool, or as a front cache of a back-end database (e.g., Memcached) to speed up data read and write efficiency. In addition, the method is also used in the upper-layer application (such as HBase) of Hadoop and Spark of big data analysis tools to support unstructured data storage.
However, in the face of the ever-increasing volume of data and the high computational load associated with processing large-scale data, conventional TCP/IP network protocols and hardware devices have slowly not kept pace with high-performance cores and high transmission lines (100 Gbs). Network IO performance and computational resource strain begin to become bottlenecks in key-value storage systems.
Thus, efficient network hardware and more advanced transport protocols are introduced into conventional key-value storage systems. As the price of high-performance computing hardware decreases, data centers gradually begin to use, for example, rdma (remote Direct Memory access) technology to improve the transmission and computing performance of the Memory key value storage system. RDMA operations allow a machine to read (or write) from a pre-registered memory region of another machine without involving the CPU on the remote side. RDMA achieves minimal round-trip delay (in microseconds), highest throughput, and lowest CPU overhead compared to traditional messaging. By combining RDMA with a key value storage system, the online processing speed can be greatly improved, and the data intensive workload is reduced. Meanwhile, the RDMA starts to support RoCE (RDMA over converted Ethernet) protocol, which is an extension technology allowing RDMA hardware to run at the bottom layer of an Ethernet link, so that the RDMA high-performance hardware can be compatible with the traditional Ethernet, and the RDMA high-performance hardware is introduced into a traditional key value storage system and has good adaptability.
There are several problems to be solved when using RDMA-based key-value storage systems for data transfer. Through tests, data transmission between nodes needs 1-3 microseconds, and the searching of a memory only needs 60-120 nanoseconds, and the time delay of the node occupies a main part, which shows that whether the transmission efficiency is high or low directly influences the overall read-write performance of the key value storage system. However, in recent research work on RDMA-based key-value storage systems, remote memory access is mostly performed in RDMA read mode, such as the transfer mode used in Pilaf and FaRM systems. The RDMA read operation bypasses the kernel of the remote server, but also causes the remote to be unable to perform complex addressing, and the data transmission between the client and the server needs to be completed by multiple round trips. The time delay caused by the design of multiple round trip transmission is obviously longer than that of the design of single round trip transmission, and the user experience is obviously reduced. Therefore, the problem of multiple round-trip transmissions brought by the RDMA read mode will greatly reduce the overall performance of the key-value storage system.
In addition, while the RDMA read-based operation makes multiple round trips, although the kernel of the remote server is bypassed (which is also one of the reasons that the remote cannot perform complicated addressing), multiple transmissions can cause interruption and thread switching to the CPU of the client, and more than one client is often applied to use the CPU, so the experience of the user at the application layer level is greatly reduced. Meanwhile, the server side exists for providing services, the CPU of the server cannot have excessive application switching, and the server side looks inverted after an inner core of the server side is excessively pursued to be a perfect bypass.
Disclosure of Invention
In order to solve the problems in the prior art, the invention aims to provide a RDMA-based key value storage system transmission method, which can effectively reduce the round-trip communication delay of a memory key value storage system, improve the throughput, and improve the upper-layer user experience of a client by bypassing the client kernel by using RDMA unilateral semantics.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, a method for RDMA-based key-value storage system transmission includes the following steps:
the client side and the server side are connected with each other, the server side registers an RDMA memory for creating a command queue, and the client side registers the RDMA memory for receiving a return data block and mutually transmitting a memory address and an access key;
after the connection is successfully established, the client sends a GET/PUT instruction to the server in a unilateral write semantic form;
the server side receives parallel processing requests of multiple clients and stores the requests in a command queue, analyzes and responds data in the command queue according to RDMA unilateral write semantics, and sends value data to a memory of the client side in a mode of bypassing a kernel of the client side for a GET instruction; for PUT instructions, the value store is added or updated locally.
In a preferred embodiment, sending the GET/PUT command in unilateral write semantic form by the client is implemented by calling an RDMA write function, where the RDMA write function parameters include:
r _ address, which is the virtual memory mapping from the server,
r _ key, which is an access key transmitted from the server,
and the data is the relevant information of the request, and contains the information correspondingly required by the operation type on the basis of distinguishing the operation type.
As a preferred implementation, for a GET request, the data includes:
a command for distinguishing request types;
the key is a key of a target object of the request in a key value storage system and is used for searching a value address space in the index at the opposite end;
l _ address, which is an address space used for storing return data in the memory of the client; and
l _ key, access key for client.
As a preferred embodiment, for the GET request, the parsing and responding the data in the command queue according to RDMA single-side write semantics includes:
the server creates the received data in the thread processing instruction queue and analyzes the parameters in the data;
according to command, determining that the command is GET, and creating a response function RDMA-write (l _ address, l _ key, r _ data);
accessing the hash table according to the key to obtain the address mapping of the storage block where the corresponding value is located, and taking the value out of the storage block according to the mapping address and packaging the value into r _ data of the response function;
directly filling the analyzed l _ address and l _ key into the l _ address and l _ key of the response function;
and after the key is successfully matched with the l _ key of the client, sending the data to a client memory specified by the l _ address in a form of bypassing a client kernel, and receiving a GET result by the client.
As a preferred implementation, for the PUT request, the data includes:
a command for distinguishing request types;
key, which is the key of the data block required to be written in the request in the key value storage system;
value, which is the value of the data block in the key value storage system that needs to be written for the request.
In a preferred embodiment, for a PUT request, the parsing and responding data in a command queue according to RDMA single-side write semantics includes:
the server creates the received data in the thread processing instruction queue and analyzes the parameters in the data;
determining that the command is PUT according to command, starting an index access thread to execute write-in operation, and newly building a key value pair of < new _ key, new _ value > in a hash table;
writing the key into a new _ key of the newly-built key value pair according to the analyzed key;
and according to the analyzed value, newly building a section of data storage block in the memory area, copying the value into the newly built storage block, and writing the access address of the storage block into the key value pair new _ value.
As a preferred embodiment, when sending a request, the client divides priority levels according to task urgency, and sends a priority flag bit and a data block to the server, and the server commands the queue to receive the flag bit and the data block, then serially takes out and sequentially processes the flag bit and the data block according to priority.
In a second aspect, a data processing apparatus, the apparatus comprising:
one or more processors;
a memory;
and one or more programs, wherein the one or more programs are stored in the memory and configured for execution by the one or more processors, which when executed by the processors implement the RDMA-based key-value storage system data transfer method of the first aspect of the invention.
In a third aspect, a computer readable storage medium stores computer instructions which, when executed by a processor, implement the RDMA-based key-value storage system transfer method of the first aspect of the invention.
Compared with the traditional TCP/IP communication protocol and other RDMA semantic designs, the design method only needs one round-trip transmission, automatically processes the command queue, completes data access and simultaneously releases the CPU overhead of the client. The method can be applied to a scene that an internal memory key value storage system is used as a database engine under the RDMA hardware environment.
Drawings
FIG. 1 is a schematic diagram of a method of transmission of an RDMA-based key-value storage system, according to an embodiment of the invention;
FIG. 2 is a schematic diagram illustrating a client and a server establishing a connection with each other according to an embodiment of the present invention;
FIG. 3 is a diagram of a command queue and a polling thread according to an embodiment of the invention;
FIG. 4 is a schematic diagram of a request phase of a GET instruction according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of the response and return phases of a GET instruction, according to an embodiment of the invention;
FIG. 6 is a schematic diagram of a PUT instruction client sending phase according to an embodiment of the present invention;
fig. 7 is a schematic diagram of a phase of a server-side PUT instruction response according to an embodiment of the present invention.
Detailed Description
The technical solution of the present invention is further explained with reference to the drawings and the embodiments.
Fig. 1 is a general schematic diagram illustrating a transmission method of a RDMA-based key-value storage system according to an embodiment of the present invention, which redesigns GET and PUT instructions of the key-value storage system using higher-performance RDMA write semantics, thereby avoiding multiple round trips, reducing transmission delay, and improving throughput. And simultaneously, the server side analyzes the GET command, after the operation required by the command is obtained, the server returns the data to the client side by adopting RDMA write, and the client side kernel is changed into a bypass client side kernel, so that the CPU overhead is released for the user. The following describes how to redesign the GET command and the PUT command by fully utilizing the high-performance semantics of RDMA to improve the transmission performance of the key value storage system.
Fig. 2 is a schematic diagram illustrating a connection between a client and a server according to an embodiment of the present invention. Firstly, a server side starts an RDMA memory registration thread to create a Command Queue (Command Queue) for receiving GET instruction cache sent by a plurality of clients by using RDMA write, and sends a server memory address r _ address and an access key r _ key of the segment to each client in advance to establish connection. And simultaneously, each client starts a RDMA memory registration thread of the client, and a memory block is created for receiving a Data block (Receive Data) returned after the GET instruction is completed. And the memory address l _ address and the access key l _ key of the client are sent to the server in advance, and connection is established in advance, namely a connection-oriented data transmission protocol. After the connection is established, the remote kernel can be bypassed by the address and the key to access the memory block.
After the connection is established, the remote memory is virtually abstracted to the address space of the local network card, for upper-layer application, the access to the remote memory storage is equivalent to the command and operation of accessing the local memory, and the implementation details are completed by the RDMA network card protocol and hardware together.
Due to the adoption of the RDMA write unilateral semantics with high transmission performance, a remote kernel is not informed to complete the access to a remote memory in the data transmission process, so that higher transmission efficiency is brought. However, the kernel is not notified, so that no matter the above is a PUT instruction based on the unilateral semantic GET instruction or the unilateral semantic, the read-write competition problem which occurs when a plurality of clients concurrently access the server data storage area cannot be immediately coordinated and solved by the server kernel which is already bypassed. Therefore, the design scheme shown in FIG. 3 is proposed.
FIG. 3 is a diagram illustrating a command queue and a polling thread according to an embodiment of the invention. Firstly, RDMA supports that a section of memory is pre-opened in the memory for caching data directly sent by a client on one side, and the section of memory is defined as a message queue in the memory of a server for receiving the data of a parallel client. The client divides priority levels according to the urgency of tasks, writes a priority flag bit into a work queue and sends the priority flag bit to the server together with the data. And after receiving the zone bits and the data blocks, the server serially takes out the zone bits and the data blocks and sequentially processes the zone bits and the data blocks according to the priority. Because the RDMA write semantic of the client bypasses the kernel of the server, the serial taking-out step cannot be automatically executed by the kernel of the server and needs to be assisted by a new polling thread, and meanwhile, according to the design of a GET instruction and a PUT instruction of a key value storage system, a small amount of CPU (Central processing Unit) processes are required for accessing an index structure. Therefore, in the process of accessing the hash table, the invention additionally creates a new polling auxiliary thread p as the first step starting of the whole process space, and the thread p mainly plays a role of polling and searching the server RDMA for caching whether a memory area of the receiving queue has a new client request to work or not, so as to meet the requirements of periodically polling and processing the client requests in sequence according to the priority. In addition, the RDMA write unilateral semantic and message queue processing scheme is compatible and adaptive with each other and is used for processing the problem of distributed read-write competition of multiple clients during kernel-free reception. This queue is named "command queue" and the thread becomes a "polling thread". And entering a direct communication stage, namely specifically designing the GET instruction and the PUT instruction in the memory key value storage system.
FIG. 4 is a diagram illustrating a request phase of a GET instruction according to an embodiment of the present invention. After the connection is already established, the client initiates the connection actively, enters a request phase of a GET instruction, the RNIC network card of the client starts RDMA write communication semantics and calls a write (r _ address, r _ key, data) request function, the first parameter r _ address of the function is the mapping of the virtual memory transmitted from the far end when the connection is established, and the remote memory can be accessed directly through the r _ address parameter. The second parameter r _ key is a contract key set in consideration of the security of the bypass kernel, and when the key is matched and verified with the remote server, the remote memory can be read and written without notifying the remote kernel. The third parameter data is stored in a Command Queue (Command Queue) of the server, which has been registered in advance by the server RNIC, for receiving the request Command sent by the client to be stored in the data. The parameter data mainly comprises four parts which are respectively:
1) command: and the specific GET instruction content indicates the access property of the operation.
2) key: the key in the key-value storage system is used for searching the value address space in the index.
3) l _ key: and after obtaining the value, starting a thread for returning data to the client, avoiding the client key l _ key of the memory of the client from secret access, and bypassing the client kernel by key matching without interrupting the current thread of the application layer of the client.
4) client receiving address: in order to receive the address space l _ address of the client side for returning the value data, the address is abstracted to the memory mapping of the server side in the data returning stage, and the returning process of the data can be directly finished by executing the single-side write semantic meaning without informing the CPU of the client side.
The GET requests of the client side are uniformly carried out in a command queueReceiving, and then the server polls the data in the queueiAnd (5) carrying out further treatment. Server-assisted processing is required because RDMA cannot support pointer tracking and index queries alone. Therefore, the server creates a Command Queue (Command Queue) and polls the data received in the QueueiAnd (6) parameter analysis.
FIG. 5 is a diagram illustrating the response and return phases of a GET instruction according to an embodiment of the present invention. As shown in the figure, the server-side kernel intervenes in the operation, and takes out and analyzes the data from the receiving queue according to the priorityiRequest work in (1), dataiThe first parameter command analysis instruction is GET or PUT, if the command is GET, a response function RDMA-write (l _ address, l _ key, r _ data) is created for the return of the next value. And fourthly, addressing the index hash table by the second parameter key, and storing the key value pair of the hash table to obtain the value address mapping corresponding to the key word key. According to the address mapping, taking out value from the storage block and packaging into the parameter r _ data of the response function created previously. Data (C)iThe third and fourth parameters are directly written into the l _ key and l _ address of the response function RDMA-write. As described above, the l _ address parameter is used as the address of the memory address space used by the client to receive data, and the l _ key parameter is used as the matching key required for accessing the memory of the client while bypassing the client kernel. After the RDMA-write response function is connected with the client, l _ key matching is carried out, and after matching is successful, value data in the data are directly transmitted to a client memory specified by the l _ address to be stored, and the GET process is completed. At this point, the GET request initiated by the client finally obtains the response of the server, and returns the value to the local memory by bypassing the kernel.
The above process is a description of the request phase and the response and return phase of the entire GET instruction of the present invention based on high performance RDMA write semantics. The request phase shown in fig. 4 is incorporated into the request function initiated in the first step of fig. 5, so that all GET instruction completion steps can be visualized. Compared with other work related to a key value storage system based on RDMA, the RDMA write semantic with the lowest communication delay is introduced to be used as the whole-process communication basic semantic, and the request phase and the return phase are optimized, so that the transmission round trip is reduced from multiple times to only one round trip transmission to complete the whole GET instruction. The multiple transmission can bring interruption and thread switching to the CPU of the client, and the client often uses more than one CPU, so that the experience of the user on the application layer level can be greatly reduced. Meanwhile, the design of the invention changes the bypass of the client kernel, so that the client kernel with more software applications is liberated, and the most practical experience of the user in front of the client is improved. Meanwhile, the server side exists for providing services, the CPU of the server is used for storing the work instruction load of the system completely, and the phenomenon that the traditional RDMA read semantics excessively pursue the perfect bypass of the kernel of the server side and is turned over at the end of effectiveness is avoided.
Fig. 6 is a schematic diagram illustrating a sending stage of a PUT command client according to an embodiment of the present invention. When a client needs to write new data into a value storage block of a server-side key value storage system or update original old data, a PUT instruction needs to be used. The flow of the PUT instruction is relatively much simpler than if the GET instruction required three phases of request, response, and return. As in the beginning step of the GET command, RDMA between the client and the server still needs to establish a connection-oriented communication mode in advance. In order to reduce the complexity of the PUT request function and save memory resources, the PUT instruction and the GET instruction share an index hash table, a value storage block, and a receive Queue (Command Queue) of work requests in the server. Because the instructions share the buffer area, the receiving queue can not be changed because the request work is used as GET or PUT operation, so the work request in the queue still uses dataiNaming, unlike GET operations, work request data under PUT operationsiThere are only three parameters, command, key and value, respectively. Polling the data received in the queueiParameters and resolves the work request. For differentiation, in dataiThe first byte position of the command indicates that the command is a GET operation/PUT operation. The above description of the GET instruction has already elaborated the memory registration part of the command queue, and this part of the process will not be described herein again. Memory registrationAfter the completion, the client knows the access address r _ address and the remote memory access key r _ key of the remote server, can directly access the server memory by the parameters r _ address and r _ key through RDMA write semantics and write data into a server command queue which is opened up in advance, and generates a work request data in the queuei. The PUT operation can be designed based on the principle to supplement the PUT operation. Because of the unilateral operation, the client ends the thread of the client after the request function RDMA-write (r _ address, r _ key, data) is sent.
Fig. 7 is a schematic diagram illustrating a phase of a server-side PUT command response according to an embodiment of the present invention. The server end receives the work request dataiAnd then, starting a polling thread of the kernel of the server, and sequentially taking out and analyzing the data of the work request by the thread according to the priority of the work requesti. The work request is determined to be a GET command or a PUT command, where the command is designated as a PUT command, by the parsed first argument command. Starting an index access thread to execute a write operation, and newly building in a hash table<new_key,new_value>A key-value pair. Parsing work request dataiAnd obtaining a primary key according to the second parameter, and writing the primary key into a new _ key of the newly-built key value pair. And parses the work request dataiAnd obtaining a written data block value by the third parameter, newly building a section of data storage block in the server storage area, copying the value parameter into the newly built storage block, writing the access address of the storage block into the key value pair new _ value, and completing the whole updating and writing of the key value pair index structure and the storage block.
The client successfully adds (or updates) a new key-value pair in the key value storage system of the server. Because the client closes the related thread after completing sending, the subsequent server does not have message notification to the client for the newly added index structure and the expansion of the storage area, so that the client still approximates the kernel bypass as a GET instruction, the CPU resource occupation is greatly reduced, and the computing resource is vacated to provide better upper-layer experience for the client with more application switching. Meanwhile, the messages do not come and go, the transmission delay is minimized, the transmission efficiency of the whole working load is improved, and the memory key value storage system is matched with the GET operation designed in the foregoing to realize high-performance transmission.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: although the present invention has been described in detail with reference to the above embodiments, the interaction between the control node and the edge computing node, the feedback information content collection and the online scheduling method in the present invention are applicable to all systems, and it should be understood by those skilled in the art that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (4)

1. An RDMA-based key-value storage system transfer method, the method comprising the steps of:
the client side and the server side are connected with each other, the server side registers an RDMA memory for creating a command queue, and the client side registers the RDMA memory for receiving a return data block and mutually transmitting a memory address and an access key;
after the connection is successfully established, the client sends a GET/PUT instruction to the server in a unilateral write semantic form;
the server side receives parallel processing requests of multiple clients and stores the requests in a command queue, analyzes and responds data in the command queue according to RDMA unilateral write semantics, and sends value data to a memory of the client side in a mode of bypassing a kernel of the client side for a GET instruction; for a PUT instruction, the value store is added or updated locally,
wherein the client sends the GET/PUT instruction in a unilateral write semantic form and is realized by calling an RDMA write function, and the RDMA write function parameters comprise:
r _ address, which is the virtual memory mapping from the server,
r _ key, which is an access key transmitted from the server,
data, which is the relevant information of the request, contains the information correspondingly needed by the operation type on the basis of distinguishing the operation type,
for a GET request, the data includes:
a command for distinguishing request types;
the key is a key of a target object of the request in a key value storage system and is used for searching a value address space in the index at the opposite end;
l _ address, which is an address space used for storing return data in the memory of the client; and
l _ key, which is a client access key;
for a PUT request, the data includes:
a command for distinguishing request types;
key, which is the key of the data block required to be written in the request in the key value storage system;
value, which is the value of the data block required to be written in the current request in the key value storage system;
for the GET request, the analyzing and responding the data in the command queue according to the RDMA unilateral write semantic comprises the following steps:
the server creates the received data in the thread processing instruction queue and analyzes the parameters in the data;
according to command, determining that the command is GET, and creating a response function RDMA-write (l _ address, l _ key, r _ data);
accessing the hash table according to the key to obtain the address mapping of the storage block where the corresponding value is located, and taking the value out of the storage block according to the mapping address and packaging the value into r _ data of the response function;
directly filling the analyzed l _ address and l _ key into the l _ address and l _ key of the response function;
after the key is successfully matched with the l _ key of the client, the data is sent to a client memory specified by the l _ address in a form of bypassing a client kernel, and the client receives a GET result;
for a PUT request, the parsing and responding the data in the command queue according to RDMA single-side write semantics includes:
the server creates the received data in the thread processing instruction queue and analyzes the parameters in the data;
determining that the command is PUT according to command, starting an index access thread to execute write-in operation, and newly building a key value pair of < new _ key, new _ value > in a hash table;
writing the key into a new _ key of the newly-built key value pair according to the analyzed key;
and according to the analyzed value, newly building a section of data storage block in the memory area, copying the value into the newly built storage block, and writing the access address of the storage block into the key value pair new _ value.
2. The RDMA-based key-value storage system transmission method according to claim 1, wherein the client, when sending a request, divides priority levels according to task urgency and sends a priority flag bit and a data block to the server, and the server command queue receives the flag bit and the data block, serially fetches the flag bit and the data block, and sequentially processes the flag bit and the data block according to priority.
3. A data processing apparatus, characterized in that the apparatus comprises:
one or more processors;
a memory;
and one or more programs, wherein the one or more programs are stored in the memory and configured for execution by the one or more processors, the programs when executed by the processors implement the RDMA-based key-value storage system transfer method of any of claims 1-2.
4. A computer-readable storage medium storing computer instructions which, when executed by a processor, implement the RDMA-based key-value storage system transfer method of any of claims 1-2.
CN202010413800.1A 2020-05-15 2020-05-15 An RDMA-based key-value storage system transmission method Active CN111459418B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010413800.1A CN111459418B (en) 2020-05-15 2020-05-15 An RDMA-based key-value storage system transmission method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010413800.1A CN111459418B (en) 2020-05-15 2020-05-15 An RDMA-based key-value storage system transmission method

Publications (2)

Publication Number Publication Date
CN111459418A CN111459418A (en) 2020-07-28
CN111459418B true CN111459418B (en) 2021-07-23

Family

ID=71681974

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010413800.1A Active CN111459418B (en) 2020-05-15 2020-05-15 An RDMA-based key-value storage system transmission method

Country Status (1)

Country Link
CN (1) CN111459418B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116171429A (en) * 2020-09-24 2023-05-26 华为技术有限公司 Apparatus and method for data processing
CN112486996B (en) * 2020-12-14 2022-08-05 上海交通大学 Object-Oriented Memory Data Storage System
CN112597195B (en) * 2020-12-16 2024-08-13 中国建设银行股份有限公司 Cache data refreshing method and distributed system
CN112817887B (en) * 2021-02-24 2021-09-17 上海交通大学 Far memory access optimization method and system under separated combined architecture
CN113259439B (en) * 2021-05-18 2022-05-06 中南大学 Key value scheduling method based on receiving end drive
CN115374024A (en) * 2021-05-21 2022-11-22 华为技术有限公司 A memory data sorting method and related equipment
CN113626184B (en) * 2021-06-30 2025-02-21 济南浪潮数据技术有限公司 A hyper-convergence performance optimization method, device and equipment
CN113568908B (en) * 2021-07-16 2024-11-19 华中科技大学 A key-value request parallel scheduling method and system
CN113608895B (en) * 2021-08-06 2024-04-09 湖南快乐阳光互动娱乐传媒有限公司 Web back-end data access method and system
CN114328453A (en) * 2021-12-27 2022-04-12 奇安信科技集团股份有限公司 KV database data management method, device, computing device and storage medium
CN115695578A (en) * 2022-09-20 2023-02-03 北京邮电大学 A data center network TCP and RDMA hybrid flow scheduling method, system and device
CN115933973B (en) * 2022-11-25 2023-09-29 中国科学技术大学 Method for remotely updating data, RDMA system and storage medium
CN115861082B (en) * 2023-03-03 2023-04-28 无锡沐创集成电路设计有限公司 Low-delay picture splicing system and method
CN117215995B (en) * 2023-11-08 2024-02-06 苏州元脑智能科技有限公司 Remote direct memory access method, distributed storage system and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107888657A (en) * 2017-10-11 2018-04-06 上海交通大学 Low latency distributed memory system
CN110147345A (en) * 2019-05-22 2019-08-20 南京大学 A kind of key assignments storage system and its working method based on RDMA
CN111078607A (en) * 2019-12-24 2020-04-28 上海交通大学 Method and system for deploying RDMA (remote direct memory Access) and non-volatile memory-oriented network access programming frame

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102404212A (en) * 2011-11-17 2012-04-04 曙光信息产业(北京)有限公司 Cross-platform RDMA communication method based on InfiniBand network
US9495325B2 (en) * 2013-12-30 2016-11-15 International Business Machines Corporation Remote direct memory access (RDMA) high performance producer-consumer message processing
US10628353B2 (en) * 2014-03-08 2020-04-21 Diamanti, Inc. Enabling use of non-volatile media-express (NVMe) over a network
US9727523B2 (en) * 2014-10-27 2017-08-08 International Business Machines Corporation Remote direct memory access (RDMA) optimized high availability for in-memory data storage
CN107665154B (en) * 2016-07-27 2020-12-04 浙江清华长三角研究院 A Reliable Data Analysis Method Based on RDMA and Message Passing
US11042657B2 (en) * 2017-09-30 2021-06-22 Intel Corporation Techniques to provide client-side security for storage of data in a network environment
CN111125049B (en) * 2019-12-24 2023-06-23 上海交通大学 RDMA and nonvolatile memory-based distributed file data block read-write method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107888657A (en) * 2017-10-11 2018-04-06 上海交通大学 Low latency distributed memory system
CN110147345A (en) * 2019-05-22 2019-08-20 南京大学 A kind of key assignments storage system and its working method based on RDMA
CN111078607A (en) * 2019-12-24 2020-04-28 上海交通大学 Method and system for deploying RDMA (remote direct memory Access) and non-volatile memory-oriented network access programming frame

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
RDMA;lailikes;《https://blog.csdn.net/songchuwang1868/article/details/83178536》;20181019;第1-9页 *

Also Published As

Publication number Publication date
CN111459418A (en) 2020-07-28

Similar Documents

Publication Publication Date Title
CN111459418B (en) An RDMA-based key-value storage system transmission method
US7415470B2 (en) Capturing and re-creating the state of a queue when migrating a session
CN111600936B (en) Asymmetric processing system based on multiple containers and suitable for ubiquitous electric power internet of things edge terminal
US20200228433A1 (en) Computer-readable recording medium including monitoring program, programmable device, and monitoring method
JP2022501736A (en) Efficient state maintenance of the execution environment in the on-demand code execution system
CN109240946A (en) The multi-level buffer method and terminal device of data
WO2018120171A1 (en) Method, device and system for executing stored procedure
CN102981911B (en) A distributed message processing system and its equipment and method
US10102230B1 (en) Rate-limiting secondary index creation for an online table
US20250247361A1 (en) Cross-security-region resource access method in cloud computing system and electronic device
US11297141B2 (en) Filesystem I/O scheduler
WO2023046141A1 (en) Acceleration framework and acceleration method for database network load performance, and device
WO2021121041A1 (en) Data transmission optimization method and device, and readable storage medium
CN107247623A (en) A kind of distributed cluster system and data connecting method based on multi-core CPU
JP2009123201A (en) Server-processor hybrid system and method for processing data
CN116455972A (en) Implementation method and system of simulation middleware based on message center communication
CN114328453A (en) KV database data management method, device, computing device and storage medium
CN115686663A (en) Online file preview method and device and computer equipment
WO2020215833A1 (en) Offline cache method and apparatus, and terminal and readable storage medium
CN113190528B (en) Parallel distributed big data architecture construction method and system
CN108075989B (en) Extensible protocol-based load balancing network middleware implementation method
CN112052104A (en) Management method and electronic equipment of message queue based on multi-room implementation
CN111209263A (en) Data storage method, device, equipment and storage medium
AU2017382907B2 (en) Technologies for scaling user interface backend clusters for database-bound applications
CN117082142A (en) Data packet caching method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载