CN118332010B - Data batch insertion method for distributed database - Google Patents
Data batch insertion method for distributed database Download PDFInfo
- Publication number
- CN118332010B CN118332010B CN202410755952.8A CN202410755952A CN118332010B CN 118332010 B CN118332010 B CN 118332010B CN 202410755952 A CN202410755952 A CN 202410755952A CN 118332010 B CN118332010 B CN 118332010B
- Authority
- CN
- China
- Prior art keywords
- data
- message
- database
- piece
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24552—Database cache management
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a data batch insertion method for a distributed database, wherein a client side constructs and transmits P messages containing parameterized SQL sentences to the database, the database analyzes the SQL sentences after receiving the P messages and generates corresponding query trees, and the query trees are cached locally; the client sends a U message to the database, the database analyzes data in the U message by using the query tree, and determines a target data node by using a fragment key field and a hash function of the data to perform route calculation; the database packages the data group of the same data node into new U messages according to the hash value, and sends the U messages to the corresponding data nodes in batches for data insertion; traversing all the determined data nodes in T2 to receive the inserted U messages in batches, and completing the data insertion process. The invention has the beneficial effects that: the efficiency of batch data insertion and the system response speed are remarkably improved.
Description
Technical Field
The invention belongs to the technical field of databases, and particularly relates to a data batch insertion method for a distributed database.
Background
In a database management system, data batch insertion is realized through an INSERT statement, and the method allows a plurality of pieces of data to be simultaneously inserted into a table so as to improve the operation efficiency. The traditional INSERT INTO table _name (column_name_list) VALUES (data 1), (data 2), (data 3) statement achieves efficiency improvement by reducing database call times. The distributed database system such as openGauss further optimizes this process, and reduces network round-trip and repetitive SQL parsing by improving PBE (Parse, bind, execute) flow and U message mechanism to send multiple pieces of data at once. However, this approach still shows limitations in the context of processing large amounts of data or requiring fine routing to specific data nodes, such as large parsing overhead, low data routing efficiency, and limited parallel processing potential.
Disclosure of Invention
In view of the foregoing, the present invention aims to propose a data batch insertion method for a distributed database, so as to optimize the U message processing logic, and significantly improve the efficiency of batch data insertion and the system response speed.
In order to achieve the above purpose, the technical scheme of the invention is realized as follows:
A data batch insertion method for a distributed database.
Further, the method comprises the following steps:
t1, sending a P message: the client builds and sends a P message containing parameterized SQL sentences to the database, the database analyzes the SQL sentences after receiving the P message, and generates a corresponding query tree, and the query tree is cached locally;
t2, sending a U message: the client sends a U message to the database, the database analyzes data in the U message by using the query tree, and determines a target data node by using a fragment key field and a hash function of the data to perform route calculation;
T3, batch data insertion: the database packages the data group of the same data node into new U messages according to the hash value, and sends the U messages to the corresponding data nodes in batches for data insertion;
And T4, finishing insertion: and traversing all the determined data nodes in the T2 to determine that the data insertion is completed, and completing the data insertion process.
Further, in the T2, the sending the U message further includes the following sub-steps:
t21, the client sends a U message to the database, wherein the U message comprises the data to be inserted and a data list;
t22, the database analyzes the actual data from the U message, and the same hash function is used for processing the slicing keys of the data of the same table;
And T23, calculating the hash value of the data in batches, and obtaining the data node number corresponding to each piece of data.
Further, the U message includes a P message and a B message, and the U message generating step includes:
S1, a client sends a B message to a database, wherein the B message comprises an SQL sentence name and a piece of data to be inserted, and the database binds the B message with a P message in T1;
s2, the client sends E information to the database, and the database executes SQL sentences of the B information in the S1 to complete the insertion of one piece of data;
s3, returning to S1 and executing the insertion of the next piece of data.
Further, in the T1, the P message includes a parameterized SQL statement to be executed, where the parameterized SQL statement is used to identify the present SQL statement;
in T2, the U message also includes the name of the parameterized SQL statement in the P message.
Further, the T3 includes the following substeps:
T31, after receiving the U message, the database center node analyzes all data, and constructs a data set according to the data;
t32, selecting a proper hash function by the database according to the distribution column type of any piece of data in the data set, and calculating a hash value of each piece of data so as to ensure that each piece of data is uniformly distributed to each data node;
T33, initializing a new U message for the target data node of each piece of data, traversing the data set, and putting the data into the corresponding new U message according to the target data node of each piece of data so as to finish data distribution;
and T34, the central node sends each new U message to the corresponding data node, and the data node performs data insertion according to the new U message.
Further, an electronic device includes a processor and a memory communicatively coupled to the processor and configured to store instructions executable by the processor, the processor configured to perform a data batch insertion method for a distributed database as described above.
Further, a server comprising at least one processor and a memory communicatively coupled to the processor, the memory storing instructions executable by the at least one processor to cause the at least one processor to perform the data bulk insertion method for a distributed database.
Further, a computer readable storage medium stores a computer program which when executed by a processor implements the described method for data bulk insertion for a distributed database.
Compared with the prior art, the data batch insertion method for the distributed database has the following beneficial effects:
(1) According to the data batch insertion method, the data and the information are processed in batches, so that the overhead of single data insertion is reduced, the overall data insertion efficiency is improved, and particularly, when a large amount of data is processed, the load of a database is obviously reduced;
(2) According to the batch data insertion method, by using the parameterized SQL statement and the cache of the query tree, repeated work of analyzing SQL by a database is reduced, so that consumption of a CPU and a memory is reduced, and utilization of resources is optimized;
(3) According to the data batch insertion method, the processing capacity of the distributed database can be expanded by adding the data nodes, and the data can be uniformly distributed on each node through effective data slicing and routing, so that the expansibility of a system is improved, and larger data volume and higher concurrent processing capacity are supported.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention. In the drawings:
FIG. 1 is a schematic flow chart of a data batch insertion method according to an embodiment of the invention;
FIG. 2 is a schematic diagram of a data batch insertion principle (prior art) according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a data batch insertion principle (the present application) according to an embodiment of the present application;
Fig. 4 is a flowchart illustrating a procedure for sending a U message according to an embodiment of the present invention.
Detailed Description
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.
The invention will be described in detail below with reference to the drawings in connection with embodiments.
As shown in fig. 2, the existing data insertion flow uses a PBE (packet, execute) mode to insert one piece of data at a time, specifically, the data insertion is completed by a piece of 'P' message and a plurality of pieces of 'B' and 'E' messages, wherein the P message is a parameterized SQL statement, the B message contains one piece of actual data to be inserted, and the E message is used for triggering and executing a data insertion action to complete data insertion. In this way, when a large amount of data is inserted, the insertion efficiency is low because of more message interactions, so the application merges a plurality of 'B' and 'E' messages into a 'U' message, and realizes a large amount of time insertion through one U message, as shown in fig. 3.
Specifically, a data batch insertion method for a distributed database, as shown in fig. 1, includes the following steps:
t1, sending a P message: first, the client builds and sends a P message containing parameterized SQL statements to the database, for example: insert into table _name (column_name_list) values ($1, $2, $n); then, after receiving the P message, the database analyzes the SQL statement of the P message and generates a corresponding query tree, and the query tree is cached locally;
t2, sending a U message: the client sends a U message to the database, the database analyzes the data in the U message through the query tree to determine a destination data node of each piece of data, and route calculation is carried out according to the fragment key field and the hash function of the data;
t3, batch data insertion: the database will pack the data sets of the same data node into new U messages according to the hash value, for example:
DN1 inserts messages in batches: {1, 1} {2, 2};
DN2 inserts messages in batches: {3, 3};
then, the new U message is sent to the corresponding data node in batches for data insertion;
and T4, finishing insertion: traversing all the determined data nodes in T2 to determine that the data insertion is completed, and completing the data insertion process.
Specifically, the U message in T2 includes a P message and a B message, and the U message generating step includes:
S1, a client sends a B message to a database, wherein the B message comprises an SQL sentence name and a piece of data to be inserted, and the database binds the B message with a P message in T1;
s2, the client sends E information to the database, and the database executes SQL sentences of the B information in the S1 to complete the insertion of one piece of data;
s3, returning to S1 and executing the insertion of the next piece of data.
The advantage is, for example, that the data (1, 1), (2, 2), (3, 3) are inserted:
P { 'sql_name', sql statement };
insertion (1, 1);
B{‘sql_name’,{(1,1,1)}}+E;
insertion (2, 2);
B{‘sql_name’,{(2,2,2)}}+E;
Insertion (3, 3);
B{‘sql_name’,{(3,3,3)}}+E。
Inserting multiple data through a U message, wherein the multiple data are inserted in batches, such as (1, 1), (2, 2), (3, 3), through one message between a client and a database in one U message:
U{‘sql_name’,3,{(1,1,1),(2,2,2),(3,3,3)}}。
further as shown in fig. 4, in T2, sending the U message further includes the following sub-steps:
T21, the client sends a U message to the database, wherein the U message comprises the number of data to be inserted and a data list, for example:
{;
3,
{1,1,1};
{2,2,2};
{3,3,3};
};
T22, the database parses out the actual data from the U message, and since the data are stored in the same table, their sharded keys (one or more fields for determining on which data node the data should be stored) are identical, the sharded keys of the data of the same table can be processed using the same hash function, which is advantageous in that: determining a hash function for the sharding key before processing the insert data operation, thereby rapidly and accurately calculating a hash value of each piece of data and routing the data to the correct data node according to the hash value;
And T23, for each piece of data to be inserted, calculating the hash value of the data in batches, and obtaining the data node number corresponding to each piece of data.
Specifically, in T22, the type of the sharded key of the data to be inserted is determined first, and then hash values of all the data are calculated in batches according to the type of the sharded key.
Further, T3 comprises the following sub-steps:
T31, after receiving the U message, the database center node analyzes all data, and constructs a data set according to the data;
t32, selecting a proper hash function by the database according to the distribution column type of any piece of data in the data set, and calculating a hash value of each piece of data so as to ensure that each piece of data is uniformly distributed to each data node;
T33, initializing a new U message for the target data node of each piece of data, traversing the data set, and putting the data into the corresponding new U message according to the target data node of each piece of data so as to finish data distribution;
and T34, the central node sends each new U message to the corresponding data node, and the data node performs data insertion according to the new U message.
For example, the original 'U' message:
U{‘sql_name’,3,{(1,1,1),(2,2,2),(3,3,3)}}。
New 'U' message:
DN1'U' message: { 'sql_name',2, { (1, 1), (2, 2) };
DN2'U' message: { 'sql_name',1, { (3, 3) }.
Specifically, in T1, the P message includes a parameterized SQL statement to be executed, where the parameterized SQL statement is used to identify the present SQL statement; in T2, the U message also includes the name of the parameterized SQL statement in the P message.
Those of ordinary skill in the art will appreciate that the elements and method steps of each example described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the elements and steps of each example have been described generally in terms of functionality in the foregoing description to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided in the present application, it should be understood that the disclosed methods and systems may be implemented in other ways. For example, the above-described division of units is merely a logical function division, and there may be another division manner when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted or not performed. The units may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment of the present application.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention, and are intended to be included within the scope of the appended claims and description.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.
Claims (5)
1. A data batch insertion method for a distributed database, characterized in that: the method comprises the following steps:
t1, sending a P message: the client builds and sends a P message containing parameterized SQL sentences to the database, the database analyzes the SQL sentences after receiving the P message, and generates a corresponding query tree, and the query tree is cached locally;
t2, sending a U message: the client sends a U message to the database, the database analyzes data in the U message by using the query tree, and determines a target data node by using a fragment key field and a hash function of the data to perform route calculation;
T3, batch data insertion: the database packages the data group of the same data node into new U messages according to the hash value, and sends the U messages to the corresponding data nodes in batches for data insertion;
And T4, finishing insertion: traversing all the determined data nodes in the T2 to determine that data insertion is completed, and completing a data insertion process;
In said T2, sending the U message further comprises the sub-steps of:
T21, the client sends a U message to the database, wherein the U message comprises data to be inserted and a data list;
t22, the database analyzes the actual data from the U message, and the same hash function is used for processing the slicing keys of the data of the same table;
t23, calculating hash values of the data in batches, and obtaining a data node number corresponding to each piece of data;
The U message comprises a P message and a B message, and the U message generating step comprises the following steps:
S1, a client sends a B message to a database, wherein the B message comprises an SQL sentence name and a piece of data to be inserted, and the database binds the B message with a P message in T1;
s2, the client sends E information to the database, and the database executes SQL sentences of the B information in the S1 to complete the insertion of one piece of data;
s3, returning to S1 to execute the insertion of the next piece of data;
the T3 comprises the following substeps:
T31, after receiving the U message, the database center node analyzes all data, and constructs a data set according to the data;
t32, selecting a proper hash function by the database according to the distribution column type of any piece of data in the data set, and calculating a hash value of each piece of data so as to ensure that each piece of data is uniformly distributed to each data node;
T33, initializing a new U message for the target data node of each piece of data, traversing the data set, and putting the data into the corresponding new U message according to the target data node of each piece of data so as to finish data distribution;
and T34, the central node sends each new U message to the corresponding data node, and the data node performs data insertion according to the new U message.
2. A method of data bulk insertion for a distributed database according to claim 1, wherein: in the T1, the P message comprises a parameterized SQL statement to be executed, and the parameterized SQL statement is used for marking the SQL statement;
in T2, the U message also includes the name of the parameterized SQL statement in the P message.
3. An electronic device comprising a processor and a memory communicatively coupled to the processor for storing processor-executable instructions, characterized in that: the processor is configured to perform a data batch insertion method for a distributed database as claimed in any one of the preceding claims 1-2.
4. A server, characterized by: comprising at least one processor and a memory communicatively coupled to the processor, the memory storing instructions executable by the at least one processor to cause the at least one processor to perform a data bulk insertion method for a distributed database as claimed in any one of claims 1-2.
5. A computer-readable storage medium storing a computer program, characterized in that: the computer program, when executed by a processor, implements a data bulk insertion method for a distributed database as claimed in any one of claims 1-2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410755952.8A CN118332010B (en) | 2024-06-13 | 2024-06-13 | Data batch insertion method for distributed database |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410755952.8A CN118332010B (en) | 2024-06-13 | 2024-06-13 | Data batch insertion method for distributed database |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118332010A CN118332010A (en) | 2024-07-12 |
CN118332010B true CN118332010B (en) | 2024-10-11 |
Family
ID=91769652
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410755952.8A Active CN118332010B (en) | 2024-06-13 | 2024-06-13 | Data batch insertion method for distributed database |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118332010B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112015820A (en) * | 2020-09-01 | 2020-12-01 | 杭州欧若数网科技有限公司 | Method, system, electronic device and storage medium for implementing distributed graph database |
CN112231752A (en) * | 2020-10-20 | 2021-01-15 | 南开大学 | Ciphertext insertion query deletion method without interactive frequency hiding |
CN114297278A (en) * | 2021-12-22 | 2022-04-08 | 苏州众言网络科技股份有限公司 | Batch data fast writing method, system and device |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10019297B2 (en) * | 2013-04-03 | 2018-07-10 | Salesforce.Com, Inc. | Systems and methods for implementing bulk handling in asynchronous processing |
CN114428820B (en) * | 2022-01-26 | 2025-07-15 | 普联技术有限公司 | Distributed data real-time synchronization method, system and data synchronization device |
-
2024
- 2024-06-13 CN CN202410755952.8A patent/CN118332010B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112015820A (en) * | 2020-09-01 | 2020-12-01 | 杭州欧若数网科技有限公司 | Method, system, electronic device and storage medium for implementing distributed graph database |
CN112231752A (en) * | 2020-10-20 | 2021-01-15 | 南开大学 | Ciphertext insertion query deletion method without interactive frequency hiding |
CN114297278A (en) * | 2021-12-22 | 2022-04-08 | 苏州众言网络科技股份有限公司 | Batch data fast writing method, system and device |
Also Published As
Publication number | Publication date |
---|---|
CN118332010A (en) | 2024-07-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111459418B (en) | An RDMA-based key-value storage system transmission method | |
WO2018161881A1 (en) | Structuralized data processing method, data storage medium, and computer apparatus | |
CN113297250A (en) | Method and system for multi-table association query of distributed database | |
CN108399175B (en) | Data storage and query method and device | |
CN106981024A (en) | A kind of trading limit calculates processing system and its processing method | |
CN113157734B (en) | Data processing method, device and equipment based on search framework and storage medium | |
CN107239485A (en) | Database operation method, apparatus and system | |
WO2023142605A1 (en) | Blockchain-based data processing method and related apparatus | |
CN104866339A (en) | Distributed persistent management method, system and device of FOTA data | |
CN111352915B (en) | Machine learning system, machine learning parameter server and implementation method | |
US10089339B2 (en) | Datagram reassembly | |
CN110955664A (en) | Method and device for routing messages in different banks and tables | |
US20110125848A1 (en) | Method of performing data mediation, and an associated computer program product, data mediation device and information system | |
CN107229628A (en) | The method and device of distributed data base pretreatment | |
CN118332010B (en) | Data batch insertion method for distributed database | |
CN113934767B (en) | A data processing method and device, computer equipment and storage medium | |
Bayerdorffer | Distributed programming with associative broadcast | |
CN113157451B (en) | Method and apparatus for executing blocks in a blockchain system | |
CN116302599A (en) | Message processing method, device and system based on message middleware | |
Liebeherr et al. | The effect of index partitioning schemes on the performance of distributed query processing | |
CN119941396A (en) | Blockchain data processing method, device, equipment and storage medium | |
CN116028550A (en) | A real-time statistical method based on multicast and shared memory and cloud web application firewall node | |
WO2022077916A1 (en) | Data processing system, data processing method based on blockchain, and device | |
JP2016045594A (en) | Data processing apparatus, data processing method, and data processing program | |
CN114063931A (en) | Data storage method based on big data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |