+

CN118332010B - Data batch insertion method for distributed database - Google Patents

Data batch insertion method for distributed database Download PDF

Info

Publication number
CN118332010B
CN118332010B CN202410755952.8A CN202410755952A CN118332010B CN 118332010 B CN118332010 B CN 118332010B CN 202410755952 A CN202410755952 A CN 202410755952A CN 118332010 B CN118332010 B CN 118332010B
Authority
CN
China
Prior art keywords
data
message
database
piece
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410755952.8A
Other languages
Chinese (zh)
Other versions
CN118332010A (en
Inventor
易成龙
张益�
吴明远
侯晓凡
赵健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Nankai University General Data Technologies Co ltd
Original Assignee
Tianjin Nankai University General Data Technologies Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Nankai University General Data Technologies Co ltd filed Critical Tianjin Nankai University General Data Technologies Co ltd
Priority to CN202410755952.8A priority Critical patent/CN118332010B/en
Publication of CN118332010A publication Critical patent/CN118332010A/en
Application granted granted Critical
Publication of CN118332010B publication Critical patent/CN118332010B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data batch insertion method for a distributed database, wherein a client side constructs and transmits P messages containing parameterized SQL sentences to the database, the database analyzes the SQL sentences after receiving the P messages and generates corresponding query trees, and the query trees are cached locally; the client sends a U message to the database, the database analyzes data in the U message by using the query tree, and determines a target data node by using a fragment key field and a hash function of the data to perform route calculation; the database packages the data group of the same data node into new U messages according to the hash value, and sends the U messages to the corresponding data nodes in batches for data insertion; traversing all the determined data nodes in T2 to receive the inserted U messages in batches, and completing the data insertion process. The invention has the beneficial effects that: the efficiency of batch data insertion and the system response speed are remarkably improved.

Description

Data batch insertion method for distributed database
Technical Field
The invention belongs to the technical field of databases, and particularly relates to a data batch insertion method for a distributed database.
Background
In a database management system, data batch insertion is realized through an INSERT statement, and the method allows a plurality of pieces of data to be simultaneously inserted into a table so as to improve the operation efficiency. The traditional INSERT INTO table _name (column_name_list) VALUES (data 1), (data 2), (data 3) statement achieves efficiency improvement by reducing database call times. The distributed database system such as openGauss further optimizes this process, and reduces network round-trip and repetitive SQL parsing by improving PBE (Parse, bind, execute) flow and U message mechanism to send multiple pieces of data at once. However, this approach still shows limitations in the context of processing large amounts of data or requiring fine routing to specific data nodes, such as large parsing overhead, low data routing efficiency, and limited parallel processing potential.
Disclosure of Invention
In view of the foregoing, the present invention aims to propose a data batch insertion method for a distributed database, so as to optimize the U message processing logic, and significantly improve the efficiency of batch data insertion and the system response speed.
In order to achieve the above purpose, the technical scheme of the invention is realized as follows:
A data batch insertion method for a distributed database.
Further, the method comprises the following steps:
t1, sending a P message: the client builds and sends a P message containing parameterized SQL sentences to the database, the database analyzes the SQL sentences after receiving the P message, and generates a corresponding query tree, and the query tree is cached locally;
t2, sending a U message: the client sends a U message to the database, the database analyzes data in the U message by using the query tree, and determines a target data node by using a fragment key field and a hash function of the data to perform route calculation;
T3, batch data insertion: the database packages the data group of the same data node into new U messages according to the hash value, and sends the U messages to the corresponding data nodes in batches for data insertion;
And T4, finishing insertion: and traversing all the determined data nodes in the T2 to determine that the data insertion is completed, and completing the data insertion process.
Further, in the T2, the sending the U message further includes the following sub-steps:
t21, the client sends a U message to the database, wherein the U message comprises the data to be inserted and a data list;
t22, the database analyzes the actual data from the U message, and the same hash function is used for processing the slicing keys of the data of the same table;
And T23, calculating the hash value of the data in batches, and obtaining the data node number corresponding to each piece of data.
Further, the U message includes a P message and a B message, and the U message generating step includes:
S1, a client sends a B message to a database, wherein the B message comprises an SQL sentence name and a piece of data to be inserted, and the database binds the B message with a P message in T1;
s2, the client sends E information to the database, and the database executes SQL sentences of the B information in the S1 to complete the insertion of one piece of data;
s3, returning to S1 and executing the insertion of the next piece of data.
Further, in the T1, the P message includes a parameterized SQL statement to be executed, where the parameterized SQL statement is used to identify the present SQL statement;
in T2, the U message also includes the name of the parameterized SQL statement in the P message.
Further, the T3 includes the following substeps:
T31, after receiving the U message, the database center node analyzes all data, and constructs a data set according to the data;
t32, selecting a proper hash function by the database according to the distribution column type of any piece of data in the data set, and calculating a hash value of each piece of data so as to ensure that each piece of data is uniformly distributed to each data node;
T33, initializing a new U message for the target data node of each piece of data, traversing the data set, and putting the data into the corresponding new U message according to the target data node of each piece of data so as to finish data distribution;
and T34, the central node sends each new U message to the corresponding data node, and the data node performs data insertion according to the new U message.
Further, an electronic device includes a processor and a memory communicatively coupled to the processor and configured to store instructions executable by the processor, the processor configured to perform a data batch insertion method for a distributed database as described above.
Further, a server comprising at least one processor and a memory communicatively coupled to the processor, the memory storing instructions executable by the at least one processor to cause the at least one processor to perform the data bulk insertion method for a distributed database.
Further, a computer readable storage medium stores a computer program which when executed by a processor implements the described method for data bulk insertion for a distributed database.
Compared with the prior art, the data batch insertion method for the distributed database has the following beneficial effects:
(1) According to the data batch insertion method, the data and the information are processed in batches, so that the overhead of single data insertion is reduced, the overall data insertion efficiency is improved, and particularly, when a large amount of data is processed, the load of a database is obviously reduced;
(2) According to the batch data insertion method, by using the parameterized SQL statement and the cache of the query tree, repeated work of analyzing SQL by a database is reduced, so that consumption of a CPU and a memory is reduced, and utilization of resources is optimized;
(3) According to the data batch insertion method, the processing capacity of the distributed database can be expanded by adding the data nodes, and the data can be uniformly distributed on each node through effective data slicing and routing, so that the expansibility of a system is improved, and larger data volume and higher concurrent processing capacity are supported.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention. In the drawings:
FIG. 1 is a schematic flow chart of a data batch insertion method according to an embodiment of the invention;
FIG. 2 is a schematic diagram of a data batch insertion principle (prior art) according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a data batch insertion principle (the present application) according to an embodiment of the present application;
Fig. 4 is a flowchart illustrating a procedure for sending a U message according to an embodiment of the present invention.
Detailed Description
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.
The invention will be described in detail below with reference to the drawings in connection with embodiments.
As shown in fig. 2, the existing data insertion flow uses a PBE (packet, execute) mode to insert one piece of data at a time, specifically, the data insertion is completed by a piece of 'P' message and a plurality of pieces of 'B' and 'E' messages, wherein the P message is a parameterized SQL statement, the B message contains one piece of actual data to be inserted, and the E message is used for triggering and executing a data insertion action to complete data insertion. In this way, when a large amount of data is inserted, the insertion efficiency is low because of more message interactions, so the application merges a plurality of 'B' and 'E' messages into a 'U' message, and realizes a large amount of time insertion through one U message, as shown in fig. 3.
Specifically, a data batch insertion method for a distributed database, as shown in fig. 1, includes the following steps:
t1, sending a P message: first, the client builds and sends a P message containing parameterized SQL statements to the database, for example: insert into table _name (column_name_list) values ($1, $2, $n); then, after receiving the P message, the database analyzes the SQL statement of the P message and generates a corresponding query tree, and the query tree is cached locally;
t2, sending a U message: the client sends a U message to the database, the database analyzes the data in the U message through the query tree to determine a destination data node of each piece of data, and route calculation is carried out according to the fragment key field and the hash function of the data;
t3, batch data insertion: the database will pack the data sets of the same data node into new U messages according to the hash value, for example:
DN1 inserts messages in batches: {1, 1} {2, 2};
DN2 inserts messages in batches: {3, 3};
then, the new U message is sent to the corresponding data node in batches for data insertion;
and T4, finishing insertion: traversing all the determined data nodes in T2 to determine that the data insertion is completed, and completing the data insertion process.
Specifically, the U message in T2 includes a P message and a B message, and the U message generating step includes:
S1, a client sends a B message to a database, wherein the B message comprises an SQL sentence name and a piece of data to be inserted, and the database binds the B message with a P message in T1;
s2, the client sends E information to the database, and the database executes SQL sentences of the B information in the S1 to complete the insertion of one piece of data;
s3, returning to S1 and executing the insertion of the next piece of data.
The advantage is, for example, that the data (1, 1), (2, 2), (3, 3) are inserted:
P { 'sql_name', sql statement };
insertion (1, 1);
B{‘sql_name’,{(1,1,1)}}+E;
insertion (2, 2);
B{‘sql_name’,{(2,2,2)}}+E;
Insertion (3, 3);
B{‘sql_name’,{(3,3,3)}}+E。
Inserting multiple data through a U message, wherein the multiple data are inserted in batches, such as (1, 1), (2, 2), (3, 3), through one message between a client and a database in one U message:
U{‘sql_name’,3,{(1,1,1),(2,2,2),(3,3,3)}}。
further as shown in fig. 4, in T2, sending the U message further includes the following sub-steps:
T21, the client sends a U message to the database, wherein the U message comprises the number of data to be inserted and a data list, for example:
{;
3,
{1,1,1};
{2,2,2};
{3,3,3};
};
T22, the database parses out the actual data from the U message, and since the data are stored in the same table, their sharded keys (one or more fields for determining on which data node the data should be stored) are identical, the sharded keys of the data of the same table can be processed using the same hash function, which is advantageous in that: determining a hash function for the sharding key before processing the insert data operation, thereby rapidly and accurately calculating a hash value of each piece of data and routing the data to the correct data node according to the hash value;
And T23, for each piece of data to be inserted, calculating the hash value of the data in batches, and obtaining the data node number corresponding to each piece of data.
Specifically, in T22, the type of the sharded key of the data to be inserted is determined first, and then hash values of all the data are calculated in batches according to the type of the sharded key.
Further, T3 comprises the following sub-steps:
T31, after receiving the U message, the database center node analyzes all data, and constructs a data set according to the data;
t32, selecting a proper hash function by the database according to the distribution column type of any piece of data in the data set, and calculating a hash value of each piece of data so as to ensure that each piece of data is uniformly distributed to each data node;
T33, initializing a new U message for the target data node of each piece of data, traversing the data set, and putting the data into the corresponding new U message according to the target data node of each piece of data so as to finish data distribution;
and T34, the central node sends each new U message to the corresponding data node, and the data node performs data insertion according to the new U message.
For example, the original 'U' message:
U{‘sql_name’,3,{(1,1,1),(2,2,2),(3,3,3)}}。
New 'U' message:
DN1'U' message: { 'sql_name',2, { (1, 1), (2, 2) };
DN2'U' message: { 'sql_name',1, { (3, 3) }.
Specifically, in T1, the P message includes a parameterized SQL statement to be executed, where the parameterized SQL statement is used to identify the present SQL statement; in T2, the U message also includes the name of the parameterized SQL statement in the P message.
Those of ordinary skill in the art will appreciate that the elements and method steps of each example described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the elements and steps of each example have been described generally in terms of functionality in the foregoing description to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided in the present application, it should be understood that the disclosed methods and systems may be implemented in other ways. For example, the above-described division of units is merely a logical function division, and there may be another division manner when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted or not performed. The units may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment of the present application.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention, and are intended to be included within the scope of the appended claims and description.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (5)

1. A data batch insertion method for a distributed database, characterized in that: the method comprises the following steps:
t1, sending a P message: the client builds and sends a P message containing parameterized SQL sentences to the database, the database analyzes the SQL sentences after receiving the P message, and generates a corresponding query tree, and the query tree is cached locally;
t2, sending a U message: the client sends a U message to the database, the database analyzes data in the U message by using the query tree, and determines a target data node by using a fragment key field and a hash function of the data to perform route calculation;
T3, batch data insertion: the database packages the data group of the same data node into new U messages according to the hash value, and sends the U messages to the corresponding data nodes in batches for data insertion;
And T4, finishing insertion: traversing all the determined data nodes in the T2 to determine that data insertion is completed, and completing a data insertion process;
In said T2, sending the U message further comprises the sub-steps of:
T21, the client sends a U message to the database, wherein the U message comprises data to be inserted and a data list;
t22, the database analyzes the actual data from the U message, and the same hash function is used for processing the slicing keys of the data of the same table;
t23, calculating hash values of the data in batches, and obtaining a data node number corresponding to each piece of data;
The U message comprises a P message and a B message, and the U message generating step comprises the following steps:
S1, a client sends a B message to a database, wherein the B message comprises an SQL sentence name and a piece of data to be inserted, and the database binds the B message with a P message in T1;
s2, the client sends E information to the database, and the database executes SQL sentences of the B information in the S1 to complete the insertion of one piece of data;
s3, returning to S1 to execute the insertion of the next piece of data;
the T3 comprises the following substeps:
T31, after receiving the U message, the database center node analyzes all data, and constructs a data set according to the data;
t32, selecting a proper hash function by the database according to the distribution column type of any piece of data in the data set, and calculating a hash value of each piece of data so as to ensure that each piece of data is uniformly distributed to each data node;
T33, initializing a new U message for the target data node of each piece of data, traversing the data set, and putting the data into the corresponding new U message according to the target data node of each piece of data so as to finish data distribution;
and T34, the central node sends each new U message to the corresponding data node, and the data node performs data insertion according to the new U message.
2. A method of data bulk insertion for a distributed database according to claim 1, wherein: in the T1, the P message comprises a parameterized SQL statement to be executed, and the parameterized SQL statement is used for marking the SQL statement;
in T2, the U message also includes the name of the parameterized SQL statement in the P message.
3. An electronic device comprising a processor and a memory communicatively coupled to the processor for storing processor-executable instructions, characterized in that: the processor is configured to perform a data batch insertion method for a distributed database as claimed in any one of the preceding claims 1-2.
4. A server, characterized by: comprising at least one processor and a memory communicatively coupled to the processor, the memory storing instructions executable by the at least one processor to cause the at least one processor to perform a data bulk insertion method for a distributed database as claimed in any one of claims 1-2.
5. A computer-readable storage medium storing a computer program, characterized in that: the computer program, when executed by a processor, implements a data bulk insertion method for a distributed database as claimed in any one of claims 1-2.
CN202410755952.8A 2024-06-13 2024-06-13 Data batch insertion method for distributed database Active CN118332010B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410755952.8A CN118332010B (en) 2024-06-13 2024-06-13 Data batch insertion method for distributed database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410755952.8A CN118332010B (en) 2024-06-13 2024-06-13 Data batch insertion method for distributed database

Publications (2)

Publication Number Publication Date
CN118332010A CN118332010A (en) 2024-07-12
CN118332010B true CN118332010B (en) 2024-10-11

Family

ID=91769652

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410755952.8A Active CN118332010B (en) 2024-06-13 2024-06-13 Data batch insertion method for distributed database

Country Status (1)

Country Link
CN (1) CN118332010B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112015820A (en) * 2020-09-01 2020-12-01 杭州欧若数网科技有限公司 Method, system, electronic device and storage medium for implementing distributed graph database
CN112231752A (en) * 2020-10-20 2021-01-15 南开大学 Ciphertext insertion query deletion method without interactive frequency hiding
CN114297278A (en) * 2021-12-22 2022-04-08 苏州众言网络科技股份有限公司 Batch data fast writing method, system and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10019297B2 (en) * 2013-04-03 2018-07-10 Salesforce.Com, Inc. Systems and methods for implementing bulk handling in asynchronous processing
CN114428820B (en) * 2022-01-26 2025-07-15 普联技术有限公司 Distributed data real-time synchronization method, system and data synchronization device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112015820A (en) * 2020-09-01 2020-12-01 杭州欧若数网科技有限公司 Method, system, electronic device and storage medium for implementing distributed graph database
CN112231752A (en) * 2020-10-20 2021-01-15 南开大学 Ciphertext insertion query deletion method without interactive frequency hiding
CN114297278A (en) * 2021-12-22 2022-04-08 苏州众言网络科技股份有限公司 Batch data fast writing method, system and device

Also Published As

Publication number Publication date
CN118332010A (en) 2024-07-12

Similar Documents

Publication Publication Date Title
CN111459418B (en) An RDMA-based key-value storage system transmission method
WO2018161881A1 (en) Structuralized data processing method, data storage medium, and computer apparatus
CN113297250A (en) Method and system for multi-table association query of distributed database
CN108399175B (en) Data storage and query method and device
CN106981024A (en) A kind of trading limit calculates processing system and its processing method
CN113157734B (en) Data processing method, device and equipment based on search framework and storage medium
CN107239485A (en) Database operation method, apparatus and system
WO2023142605A1 (en) Blockchain-based data processing method and related apparatus
CN104866339A (en) Distributed persistent management method, system and device of FOTA data
CN111352915B (en) Machine learning system, machine learning parameter server and implementation method
US10089339B2 (en) Datagram reassembly
CN110955664A (en) Method and device for routing messages in different banks and tables
US20110125848A1 (en) Method of performing data mediation, and an associated computer program product, data mediation device and information system
CN107229628A (en) The method and device of distributed data base pretreatment
CN118332010B (en) Data batch insertion method for distributed database
CN113934767B (en) A data processing method and device, computer equipment and storage medium
Bayerdorffer Distributed programming with associative broadcast
CN113157451B (en) Method and apparatus for executing blocks in a blockchain system
CN116302599A (en) Message processing method, device and system based on message middleware
Liebeherr et al. The effect of index partitioning schemes on the performance of distributed query processing
CN119941396A (en) Blockchain data processing method, device, equipment and storage medium
CN116028550A (en) A real-time statistical method based on multicast and shared memory and cloud web application firewall node
WO2022077916A1 (en) Data processing system, data processing method based on blockchain, and device
JP2016045594A (en) Data processing apparatus, data processing method, and data processing program
CN114063931A (en) Data storage method based on big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载