+

CN112307117B - Synchronization method and synchronization system based on log analysis - Google Patents

Synchronization method and synchronization system based on log analysis Download PDF

Info

Publication number
CN112307117B
CN112307117B CN202011056091.2A CN202011056091A CN112307117B CN 112307117 B CN112307117 B CN 112307117B CN 202011056091 A CN202011056091 A CN 202011056091A CN 112307117 B CN112307117 B CN 112307117B
Authority
CN
China
Prior art keywords
rollback
partial rollback
interval
transaction
partial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011056091.2A
Other languages
Chinese (zh)
Other versions
CN112307117A (en
Inventor
孙峰
彭青松
刘启春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Dream Database Co ltd
Original Assignee
Wuhan Dream Database Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Dream Database Co ltd filed Critical Wuhan Dream Database Co ltd
Priority to CN202011056091.2A priority Critical patent/CN112307117B/en
Publication of CN112307117A publication Critical patent/CN112307117A/en
Application granted granted Critical
Publication of CN112307117B publication Critical patent/CN112307117B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1471Saving, restoring, recovering or retrying involving logging of persistent data for recovery
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a synchronization method and a synchronization system based on log analysis, wherein the synchronization method comprises the following steps: the log receiving thread judges the type of operation; when the operation is the DML operation, adding the DML operation and the operation number into a corresponding transaction cache file, updating the operation number of which the variable y is equal to the current DML operation, and updating the log serial number of which the storage LSN is equal to the current DML operation; when the partial rollback operation is performed, constructing a partial rollback interval [ x, y ] by adopting a target operation number x and a target variable y, adding the partial rollback interval [ x, y ] into a partial rollback linked list, and updating a log serial number of a storage LSN equal to the current partial rollback operation; and when the operation is submitted, distributing the corresponding transaction to an execution thread, and carrying out data synchronization by the execution thread according to the operation number of the operation to be executed and the corresponding partial rollback list.

Description

Synchronization method and synchronization system based on log analysis
Technical Field
The invention belongs to the technical field of synchronization, and particularly relates to a log analysis-based synchronization method and a log analysis-based synchronization system.
Background
The database data real-time synchronization is a technical scheme for improving the availability of an information system and ensuring the continuity of services. By means of real-time synchronization of the data, service data of the target database and service data of the source database are kept consistent in real time, and when the source database breaks down and service is interrupted, an application system can be quickly switched to the target database, and the requirement of service continuity is guaranteed.
The database data real-time replication technology based on log analysis has the characteristics of small influence on the performance and data mode of a source database, support of a heterogeneous operating system and a database platform, high data replication performance and the like, and is widely applied to the fields of emergency disaster recovery, multi-service centers, heterogeneous resource integration, data migration and the like. The technology captures an online log or an archive log of a source database through a log capturing process of the source end, analyzes an INSERT operation, an UPDATE operation and a DELETE operation of the database to be converted into a message packet in an internal specific format, sends the message packet to a destination end of a replication system through a TCP/IP (Transmission Control Protocol/Internet Protocol, abbreviated as TCP/IP) network, receives the message packet by the destination end, performs unpacking processing, restores transaction information of the source end into corresponding SQL (Structured Query Language, abbreviated as SQL) sentences, and performs real-time replication on a target database through a local database interface so as to realize database data synchronization.
In the data synchronization system, the source end data synchronization service captures the operation of the database according to the sequence of the database log generation, the destination end data synchronization service receives and manages the transaction according to the sequence of the source end sending operation, the transaction is classified and managed according to the transaction ID at the destination end, and when the commit message of a certain transaction is received, the transaction is executed, so that all the operations of the transaction need to be cached before the commit message is not received. The operation quantity scale of the transaction is unlimited, if the operation of the transaction is cached in the memory, the shortage of memory resources is caused, and the downtime of an operating system is caused seriously. The method is characterized in that a disk is used for caching transaction operation, but partial rollback action can occur after the transaction operation is cached to the disk due to the particularity of the transaction operation, the operation which is cached to the disk needs to be cleaned, the current common cleaning mode is to cut off files after reverse positioning operation or mark the corresponding rollback operation, and the like, if a plurality of operations are adopted for compressing and packaging the cached mode in batches, more complex processes such as decompression and the like are also needed for compressed data packets, random IO can be generated in the operation process of the cleaning modes, and if large-scale partial rollback occurs, IO resources can be squeezed out, and the performance of other programs on a server is affected.
In view of this, overcoming the shortcomings of the prior art products is a problem to be solved in the art.
Disclosure of Invention
Aiming at the above defects or improvement demands of the prior art, the invention provides a synchronization method and a synchronization system based on log analysis, which aim to form a partial rollback operation interval according to operation numbers, collect partial rollback actions in a partial rollback linked list, and do not process the operation of packed cache, and save IO overhead of partial rollback deletion or marking although disk space is wasted.
In order to achieve the above object, according to one aspect of the present invention, there is provided a synchronization method based on log parsing, where the synchronization method is applied to a destination data synchronization system, the destination data synchronization system is provided with a log receiving thread and an executing thread in a matching manner, and a transaction cache file is provided on a disk space for each transaction, where the transaction cache file is provided with a variable y in a matching manner, and the transaction cache file includes a partial rollback linked list and a storage LSN; the synchronization method comprises the following steps:
the log receiving thread judges the type of operation;
when the operation is the DML operation, acquiring an operation number of the DML operation and a transaction ID to which the DML operation belongs, and determining a corresponding transaction cache file according to the transaction ID;
adding the DML operation and the operation number into a corresponding transaction cache file, wherein an update variable y is equal to the operation number of the current DML operation, and an update storage LSN is equal to the log serial number of the current DML operation;
when the partial rollback operation is performed, acquiring a transaction ID to which the partial rollback operation belongs and a rollback target operation number x, and determining a corresponding transaction cache file according to the transaction ID to obtain a target variable y;
constructing a partial rollback interval [ x, y ] by adopting the target operation number x and the target variable y, adding the partial rollback interval [ x, y ] into a partial rollback linked list, and updating a log sequence number of which the storage LSN is equal to the current partial rollback operation;
and when the operation is submitted, distributing the corresponding transaction to the execution thread, and carrying out data synchronization by the execution thread according to the operation number of the operation to be executed and the corresponding part rollback linked list.
Preferably, said constructing a partial rollback interval [ x, y ] using said target operation number x and said target variable y, and adding the partial rollback interval [ x, y ] to the partial rollback list comprises:
constructing a partial rollback interval [ x, y ] by adopting the target operation number x and the target variable y;
adding the partial rollback intervals [ x, y ] into a partial rollback list according to the sequence from small to large of the target operation number x;
judging whether the newly added partial rollback interval [ x, y ] and the existing partial rollback interval [ x, y ] are adjacent intervals or not;
if the sections are adjacent sections, combining the newly added partial rollback sections [ x, y ] with the existing partial rollback sections [ x, y ] to obtain combined partial rollback sections;
updating a variable y by a value obtained by subtracting 1 from the initial value x of the combined partial rollback interval;
if the value is not the adjacent section, the variable y is updated by the value obtained by subtracting 1 from the starting value x of the newly added partial rollback section.
Preferably, adjacent intervals refer to the y value of the previous interval plus 1 being equal to the x value of the next interval.
Preferably, the synchronization method further comprises:
and when the operation is the rollback operation, deleting the transaction cache file corresponding to the rollback operation, and releasing all the operations cached in the memory.
Preferably, adding the DML operation and the operation number to a corresponding transaction cache file, where the updating variable y is equal to the operation number of the current DML operation, and the updating the log sequence number of the storage LSN is equal to the current DML operation includes:
firstly, storing the DML operation and the operation number in a corresponding memory;
judging whether the buffer critical point is reached;
if the cache critical point is reached, compressing all DML operations in the memory to obtain compressed data, and adding the compressed data and part of rollback chain table interval information into a corresponding transaction cache file;
the update variable y is equal to the operation number of the current DML operation, and the update storage LSN is equal to the log sequence number of the current DML operation.
Preferably, the performing the data synchronization by the execution thread according to the operation number of the operation to be performed and the corresponding partial rollback table includes:
after receiving a transaction to be executed, the execution thread takes out an operation to be executed from a corresponding transaction cache file and acquires an operation number z of the operation to be executed;
sequentially extracting partial rollback intervals [ x, y ] from the partial rollback linked list;
and determining whether to perform partial rollback operation according to the relative relation between the operation number z and the partial rollback interval [ x, y ] so as to perform data synchronization.
Preferably, determining whether to perform the partial rollback operation according to the relative relationship between the operation number z and the partial rollback interval [ x, y ] includes:
judging whether the operation number z is smaller than an interval starting value x or not;
if the operation number z is smaller than the interval starting value x, executing the operation to be executed, and taking out the next operation to be executed;
if the operation number z is not smaller than the section starting value x, judging whether the operation number z is larger than the section ending value y or not;
if the operation number z is not greater than the interval termination value y, the operation to be executed belongs to the operation in the partial rollback interval, discarding the operation to be executed, and taking out the next operation to be executed.
Preferably, the synchronization method further comprises:
and if the operation number z is larger than the interval termination value y, taking out the next partial rollback interval [ x, y ], and determining whether to perform partial rollback operation or not according to the relative relation between the operation number z and the next partial rollback interval [ x, y ] until the traversal of all the partial rollback intervals is completed.
Preferably, the synchronization method further comprises:
if the operation number z is larger than the section termination value y of the last partial rollback section, executing the operation to be executed;
and directly executing the operation to be executed after the next operation to be executed is fetched.
To achieve the above object, according to another aspect of the present invention, there is provided a synchronization system including at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being programmed to perform the synchronization method of the present invention.
In general, compared with the prior art, the above technical solution conceived by the present invention has the following beneficial effects: in the invention, each DML operation has an incremental operation number, when the operation generates partial rollback, a partial rollback operation interval is formed according to the operation number, the partial rollback action is collected in a partial rollback linked list, and the operation of the packed cache is not processed, so that the IO overhead of partial rollback deletion or marking is saved although the disk space is wasted. And in the caching process, part of the rollback linked list is saved to the cache file to improve the speed of synchronous exception recovery, so that repeated transaction operation before the cache fault is avoided.
Drawings
Fig. 1 is a schematic flow chart of a synchronization method based on log parsing according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating an execution process of a log receiving thread according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an execution process of an execution thread according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a data structure of a transaction cache file according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a data structure of another transaction cache file according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a data structure of a transaction cache file according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a synchronization system according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
In the description of the present invention, the terms "inner", "outer", "longitudinal", "transverse", "upper", "lower", "top", "bottom", etc. refer to an orientation or positional relationship based on that shown in the drawings, merely for convenience of describing the present invention and do not require that the present invention must be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.
In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
Example 1:
caching data into disk files is a simple sequential write operation for files, but in the context of log-resolution-based synchronization, the complexity of caching transactions is increased by the fact that transactions may be partially rolled back (partial rollback refers to rolling back only a portion of the operations in a transaction, not all operations), and a large number of operations in the early stage are cached into disk files and then rolled back one by one if partial rollback is performed.
In order to solve the problems, the invention adopts a strategy of space replacement performance to cache the transaction, the transaction operation is sequentially written into the cache file in a mode of packing and compressing a plurality of operations during the cache, the sequential writing operation of the file is helpful to improve IO performance, and the packing and compressing of the plurality of operations can improve the compression ratio of the operation data so as to save the disk cost. When a transaction generates a partial rollback, the partial rollback actions will be collected in a partial rollback list, not processing the operations that have already been packed into a cache, while wasting disk space but eliminating the IO overhead of partial rollback deletion or tagging. When a plurality of partial rollback actions are received, adjacent partial rollback actions are combined, and the action requiring rollback is expressed in a range interval mode, so that the length of the partial rollback linked list can be effectively reduced. The check point thread records partial rollback list information at the fixed area of the file head of each stored transaction in a matching manner, so that the lasting caching function of transaction data can be further realized, the situation that all data need to be collected again during fault recovery is prevented, and the recovery speed is increased.
The embodiment provides a synchronization method based on log analysis, which is applied to a destination data synchronization system, wherein the destination data synchronization system is provided with a log receiving thread and an executing thread in a matching way. Specifically, the destination data synchronization system needs to create a log receiving thread and an executing thread after starting. The log receiving thread is responsible for receiving the operation sent by the source terminal; the execution thread is responsible for binning the transactions that acknowledge commit.
And setting a transaction cache file on a disk space for each transaction by taking the transaction as a unit, wherein the transaction cache file is provided with a variable y in a matching way, and the transaction cache file comprises a partial rollback list, a storage LSN, transaction information and offset at the end of the file.
When the target end data synchronization system starts after a fault, firstly loading a transaction cache file before the fault, and reading transaction information, a storage LSN, offset at the end of the file and partial rollback list information in the transaction cache file so as to recover the internally received transaction state of the target end data synchronization system when the last fault or stop occurs, so that the breakpoint continuous transmission function of the source end is connected, and the consistency of the transaction in the synchronization process is ensured.
Referring now to fig. 1, the synchronization method includes the steps of:
step 101: the log receiving thread judges the type of operation.
And the synchronous system is deployed in the source end database and the destination end database, the source end data synchronous system reads the log from the source end database, and the destination end database synchronous system is responsible for applying the synchronous operation sent by the source end to the destination end database.
Referring to fig. 2, the log receiving thread of the destination end analyzes the log to obtain an operation, determines the type of the operation, and executes step 102 when the operation is a DML operation; when the partial rollback operation is performed, step 104 is performed; when a commit operation, step 106 is performed.
And when the operation is the rollback operation, deleting the transaction cache file corresponding to the rollback operation, and releasing all the operations cached in the memory.
Step 102: and when the operation is the DML operation, acquiring an operation number of the DML operation and a transaction ID to which the DML operation belongs, and determining a corresponding transaction cache file according to the transaction ID.
When the source end sends the operation in the transaction, the operation number information needs to be filled in the operation, so that the destination end can realize the partial rollback operation through the operation number. Specifically, each operation in the database log stream has an independent operation number within the transaction in which it resides, and the operation number is incremented from 1. Some databases (e.g., ORACLE) are not numbered in the log of operations, but may implement simulated operation numbers for each operation by other techniques in the source-side log parsing process.
Step 103: and adding the DML operation and the operation number into a corresponding transaction cache file, wherein an update variable y is equal to the operation number of the current DML operation, and an update storage LSN is equal to the log serial number of the current DML operation.
In this embodiment, classification management is performed according to the transaction ID in the operation, the operation is added to the memory first, and after the operation reaches the cache critical point, the operation in the memory is added to the transaction cache file, and the operation number of the current operation is recorded in the dedicated variable y of the transaction.
Specifically, the DML operation and the operation number are stored in a corresponding memory; judging whether the buffer critical point is reached; if the cache critical point is reached, compressing all DML operations in the memory to obtain compressed data, and adding the compressed data and part of rollback chain table interval information into a corresponding transaction cache file; the update variable y is equal to the operation number of the current DML operation, and the update storage LSN is equal to the log sequence number of the current DML operation.
The buffer critical point may be whether the number of operations reaches a set value N, when the number of operations buffered in the memory reaches the set value N, the N operations are packed and compressed and then added to the corresponding transaction buffer file, the offset at the end of the file is recorded, the LSN of the current operation is saved as the storage LSN of the transaction, and then the next operation is continuously received. When the number of the cached operations in the memory does not reach the set value N, taking the LSN of the current operation as the storage LSN of the transaction, and continuing to receive the next operation.
According to the scheme, the critical point N of each transaction cache operand is set to control the scale of the transaction cache, the value of N can be adjusted according to different use scenes in the actual implementation process, for example, N is set to be larger under the condition that the memory is enough, the probability of generating IO (input/output) of a target-end data synchronization system can be reduced, and the problem that the performance of synchronization is influenced by the bottleneck of the IO is prevented.
In this embodiment, the operation data is cached according to the transaction unit, each transaction is cached in an independent transaction cache file, and file naming is convenient to manage and locate by using the transaction ID, and when the transaction cache file is created, a 4K space is left in front of the file to store part of rollback list information, that is, the initial offset of the file cache is 4096.
Because the operating system uses the sector as a unit when operating the file, the data cached in the transaction cache file each time is aligned according to the number of bytes of the sector, thereby being beneficial to reducing the complexity of the cache operation and improving the IO performance.
Step 104: and when the partial rollback operation is performed, acquiring a transaction ID to which the partial rollback operation belongs and a rollback target operation number x, and determining a corresponding transaction cache file according to the transaction ID to obtain a target variable y.
The partial rollback operation includes an operation number (target operation number x) pointed by the rollback operation in addition to the transaction ID pointed by the operation, which means that the transaction rolls back from the current operation number position until the designated operation number (including the operation where the operation number is located) is reached.
Step 105: and constructing a partial rollback interval [ x, y ] by adopting the target operation number x and the target variable y, adding the partial rollback interval [ x, y ] into a partial rollback linked list, and enabling the updated storage LSN to be equal to the log sequence number of the current partial rollback operation.
In this embodiment, the current operation is a partial rollback operation, a transaction ID in the operation and a target operation number x of rollback are extracted, and the transaction ID is located to the associated transaction to determine a corresponding transaction cache file, so as to obtain a target variable y.
And then constructing a partial rollback interval [ x, y ] by using the target operation number x and the target variable y, and adding the partial rollback interval into a partial rollback list, wherein the addition is ordered according to the starting number x of the interval, so that the order of the intervals in the partial rollback list from small to large is ensured.
In a preferred embodiment, after each time a partial rollback section is added, it is determined whether the added partial rollback section is adjacent to a preceding or following section, and if so, merging the preceding and following sections to form a larger section to replace a section of a small range in the original linked list.
The specific implementation mode is as follows: and constructing a partial rollback interval [ x, y ] by using the target operation number x and the target variable y.
Adding the partial rollback intervals [ x, y ] into a partial rollback list according to the sequence from small to large of the target operation number x; judging whether the newly added partial rollback interval [ x, y ] and the existing partial rollback interval [ x, y ] are adjacent intervals or not; if the sections are adjacent sections, combining the newly added partial rollback sections [ x, y ] with the existing partial rollback sections [ x, y ] to obtain combined partial rollback sections; updating the variable y by a value obtained by subtracting 1 from the combined starting value x of the partial rollback section, specifically, subtracting 1 from the combined starting value x of the partial rollback section to obtain a new value x ', and assigning the new value x' to the variable y so as to update the value of the variable y; if the value is not the adjacent interval, the variable y is updated by a value obtained by subtracting 1 from the starting value x of the newly added partial rollback interval, specifically, the starting value x of the newly added partial rollback interval is subtracted by 1 to obtain a new value x ', and the new value x' is assigned to the variable y, so that the value of the variable y is updated. Wherein, the adjacent interval refers to the value of y in the previous interval added with 1 and the value of x in the next interval is equal.
Step 106: and when the operation is submitted, distributing the corresponding transaction to the execution thread, and carrying out data synchronization by the execution thread according to the operation number of the operation to be executed and the corresponding part rollback linked list.
The current operation is a commit operation, a corresponding transaction is found according to the transaction ID of the operation, and the transaction is distributed to an execution thread for execution and warehousing.
In addition, the destination data synchronization system is matched with a check point thread, the check point thread stores transaction information received by the destination data synchronization system at fixed time, and a recovery point in fault is set. In this embodiment, the checkpoint thread updates the partial rollback chain information for the transaction into the corresponding transaction file every S seconds, traverses the currently cached transaction information, and performs the following operations. By setting the check point thread interval S, the time for fault recovery can be regulated, and in the environment with frequent service, the interval time is shortened, so that the quick recovery during faults is facilitated.
Judging the size of the storage LSN of the last check point of the current transaction and the size of the current storage LSN, if the storage LSN of the last check point is smaller than or equal to the current storage LSN, indicating that the transaction does not receive new operation since the last check point, directly skipping, not performing storage operation, and taking down the next transaction; and if not, performing a disk storage operation, firstly packing and compressing the operation of the transaction memory buffer, then adding the operation into a transaction buffer file corresponding to the transaction, recording the offset at the tail of the file, storing the disk storage LSN of the transaction, the offset at the tail of the file and the interval information in a part rollback chain table into a 4K space reserved at the head of the transaction buffer file, and taking the next transaction after completion.
In this embodiment, the sense of the checkpoint thread is that the target end needs to set a recovery point in the event of a fault in a timing manner during the operation of the receiving source end. The data in the current active transaction is saved before the recovery point is set, so that the log can be analyzed from the recovery point after the source end is recovered from the fault, and the breakpoint continuous transmission function is realized.
In this embodiment, each DML operation has an incremental operation number, a partial rollback operation interval is formed according to the operation number, the cached operation requiring rollback is not deleted or marked to reduce the influence of the partial rollback operation on the cached transaction, and the transaction caching function is implemented by adopting a policy of space permutation performance. And in the caching process, part of the rollback linked list is saved to the cache file to improve the speed of synchronous exception recovery, so that repeated transaction operation before the cache fault is avoided.
The following describes the implementation of step 106 in detail with reference to fig. 3:
firstly, after receiving a transaction to be executed, the execution thread takes out an operation to be executed from a corresponding transaction cache file, and obtains an operation number z of the operation to be executed.
Sequentially extracting partial rollback intervals [ x, y ] from the partial rollback linked list; and determining whether to perform partial rollback operation according to the relative relation between the operation number z and the partial rollback interval [ x, y ] so as to perform data synchronization.
Specifically, whether the operation number z is smaller than the interval start value x is determined, if the operation number z is smaller than the interval start value x, that is, z < x, the operation to be executed is executed, and the next operation to be executed is taken out.
If the operation number z is not less than the section start value x, judging whether the operation number z is greater than the section end value y, and if the operation number z is not greater than the section end value y, namely, z > =x and z < =y, the operation to be executed belongs to the operation in the partial rollback section, discarding the operation to be executed, and taking out the next operation to be executed.
And if the operation number z is larger than the interval termination value y, namely z > y, taking out the next partial rollback interval [ x, y ], and determining whether to perform partial rollback operation or not according to the relative relation between the operation number z and the next partial rollback interval [ x, y ] until the traversal of all the partial rollback intervals is completed.
In an actual application scene, if the operation number z is larger than the interval termination value y of the last partial rollback interval, executing the operation to be executed; and directly executing the operation to be executed after the next operation to be executed is fetched. If the operation number z is greater than the section ending value y of the last partial rollback section, it indicates that the partial rollback section has been extracted, so that operations after the operation number need to be executed without rollback.
The basic steps of the above embodiment can be explained as follows:
firstly, the invention does not need to delete the operation needing rollback in the cache file (preventing random IO from generating during rollback) when processing the rollback operation of the transaction part, and does not need to add a delete mark on the cached operation record (namely, the cached operation does not need to be stored in a plaintext), and the whole transaction cache always maintains the characteristic of sequential writing, so that a mode of packaging a plurality of operations and compressing the operations and then storing the operations is adopted when writing the file, thereby effectively saving the expenditure of disk space and reducing the pressure of IO.
And secondly, the operation of the transaction part rollback is managed in a linked list mode, and a plurality of continuous part rollback operations are combined by combining adjacent part rollback intervals, so that the length of the part rollback linked list is effectively reduced, and the part rollback operation is conveniently stored when a check point thread brushes a disk. During the execution of the transaction, the operation numbers are utilized to locate the intervals in the partial rollback chain table, and the operation falling in the rollback intervals is discarded, so that the partial rollback function is realized.
Example 2:
in order to facilitate understanding of the foregoing embodiment 1, the following description will be given by way of example. The above scheme is exemplified as follows:
the source database and the destination database both have a table T1 (ID INT), and the source application has a transaction to perform the following operation on the table T1:
INSERT INTO T1(ID)VALUES('1');
SAVEPOINT SP2;
INSERT INTO T1(ID)VALUES('2');
SAVEPOINT SP3;
INSERT INTO T1(ID)VALUES('3');
ROLLBACK TO SAVEPOINT SP3;
ROLLBACK TO SAVEPOINT SP2;
INSERT INTO T1(ID)VALUES('4');
SAVEPOINT SP5;
INSERT INTO T1(ID)VALUES('5');
ROLLBACK TO SAVEPOINT SP5;
INSERT INTO T1(ID)VALUES('6');
COMMIT;
the above operations form the following log operations:
operation numbering Operation of LSN
1 INSERT INTO T1(ID)VALUES('1') 1
2 INSERT INTO T1(ID)VALUES('2') 2
3 INSERT INTO T1(ID)VALUES('3') 3
ROLLBACK TO operation number 3 4
ROLLBACK TO operation number 2 5
4 INSERT INTO T1(ID)VALUES('4') 6
5 INSERT INTO T1(ID)VALUES('5') 7
ROLLBACK TO operation number 5 8
6 INSERT INTO T1(ID)VALUES('6') 9
COMMIT; 10
The transaction caching process is as follows:
starting a destination synchronization system, assuming that the critical value of the transaction cache operation number is 3, the current operating system sector size is 512 bytes, receiving three INSERT operations, and respectively carrying out operation numbers of 1, 2 and 3, packaging, compressing and storing the three operations to form a file format shown in fig. 4:
after the above operation is completed, the operation number in the variable y is 3.
A partial rollback operation is received which requires rollback to operation number 3, and a rollback interval constructed from it and the number in variable y is added to the partial rollback list according to rules to form { [3,3] }.
The section start number x is reduced by 1 and then given to y, and the operation number of y is 2.
A partial rollback operation is received, which requires rollback to operation number 2, and a rollback interval constructed by it and the number in variable y is added to the partial rollback list according to rules to form { [2,2], [3,3] }.
It is found that adjacent rollback sections exist in the added partial rollback list, merging is needed, and a new section { [2,3] } is formed after merging.
The section start number x is reduced by 1 and then given to y, and the operation number of y is 1.
Two INSERT operations are received, numbered 4 and 5, respectively, with y being numbered 5.
A partial rollback operation is received that requires rollback to operation number 5, and according to rules, a rollback interval is formed from it and the number in variable y and added to the partial rollback list to form { [2,3], [5,5] }.
Receiving an INSERT operation, number 6, packing, compressing and storing three operations, number 4, 5 and 6, cached in the memory to form a file format as shown in fig. 5:
if the checkpoint thread is encountered at this time to save the transaction, then the current end of file offset, save LSN and partial rollback list for the transaction will be saved into the first 4K space of the transaction file, forming a file format as shown in FIG. 6:
and receiving the COMMIT operation and distributing the COMMIT operation to the execution thread for execution.
The execution thread fetches the first INSERT (id=1) operation, whose operation number is 1.
The first partial rollback interval [2,3] is extracted.
According to the rule, operation number 1 is smaller than interval start value 2, and this operation needs to be performed, which is performed:
INSERT INTO T(ID)VALUES(1);
the second INSERT (id=2) operation is extracted, its operation number is 2, and according to the rule, operation number 2 falls in the partial rollback interval [2,3], and the operation is directly discarded without execution.
A third INSERT (id=3) operation is extracted, with an operation number of 3, and according to the rule, the operation number 3 falls within the partial rollback interval [2,3], discarding the operation directly without execution.
A fourth INSERT (id=4) operation is extracted, with an operation number of 4, the operation number 4 being greater than the partial rollback interval [2,3] according to the rules.
The next partial rollback interval is extracted [5,5].
Continuing to judge the fourth operation, the operation number 4 is smaller than the section start value 5, the operation needs to be executed, and execution is performed:
INSERT INTO T(ID)VALUES(4);
a fifth INSERT (id=5) operation is extracted, its operation number is 5, and according to the rule, operation number 5 falls within the partial rollback interval [5,5], and the operation is directly discarded without execution.
A sixth INSERT (id=6) operation is extracted, with an operation number of 6, and according to the rule, an operation number of 5 is greater than the partial rollback interval [5,5].
At this time, the partial rollback interval has already been extracted, so operations after this numbering need to be performed:
INSERT INTO T(ID)VALUES(6);
and executing COMMIT to complete synchronization.
In the process, the operation in the cache file is traversed, and then the operation needing discarding is identified by combining the rollback operation number interval recorded in the partial rollback linked list, so that the partial rollback function of the transaction is realized.
Example 3:
referring to fig. 7, fig. 7 is a schematic structural diagram of a synchronization system according to an embodiment of the invention. The synchronization system of the present embodiment includes one or more processors 41 and a memory 42. In fig. 7, a processor 41 is taken as an example.
The processor 41 and the memory 42 may be connected by a bus or otherwise, which is illustrated in fig. 7 as a bus connection.
The memory 42 is used as a non-volatile computer readable storage medium based on a synchronization method for storing non-volatile software programs, non-volatile computer executable programs and modules, the methods of the above embodiments and corresponding program instructions. The processor 41 implements the methods of the foregoing embodiments by executing nonvolatile software programs, instructions, and modules stored in the memory 42 to perform various functional applications and data processing.
The memory 42 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, memory 42 may optionally include memory located remotely from processor 41, which may be connected to processor 41 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
It should be noted that, because the content of information interaction and execution process between modules and units in the above-mentioned device and system is based on the same concept as the processing method embodiment of the present invention, specific content may be referred to the description in the method embodiment of the present invention, and will not be repeated here.
Those of ordinary skill in the art will appreciate that all or a portion of the steps in the various methods of the embodiments may be implemented by a program that instructs associated hardware, the program may be stored on a computer readable storage medium, the storage medium may include: read Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk, optical disk, or the like.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (10)

1. The synchronization method based on log analysis is characterized in that the synchronization method is applied to a destination data synchronization system, the destination data synchronization system is provided with a log receiving thread and an executing thread in a matching way, a transaction cache file is arranged on a disk space for each transaction, wherein the transaction cache file is provided with a variable y in a matching way, and comprises a partial rollback linked list and a storage LSN;
the synchronization method comprises the following steps:
the log receiving thread judges the type of operation;
when the operation is the DML operation, acquiring an operation number of the DML operation and a transaction ID to which the DML operation belongs, and determining a corresponding transaction cache file according to the transaction ID;
adding the DML operation and the operation number into a corresponding transaction cache file, wherein an update variable y is equal to the operation number of the current DML operation, and an update storage LSN is equal to the log serial number of the current DML operation;
when the partial rollback operation is performed, acquiring a transaction ID to which the partial rollback operation belongs and a rollback target operation number x, and determining a corresponding transaction cache file according to the transaction ID to obtain a target variable y;
constructing a partial rollback interval [ x, y ] by adopting the target operation number x and the target variable y, adding the partial rollback interval [ x, y ] into a partial rollback linked list, and updating a log sequence number of which the storage LSN is equal to the current partial rollback operation;
and when the operation is submitted, distributing the corresponding transaction to the execution thread, and carrying out data synchronization by the execution thread according to the operation number of the operation to be executed and the corresponding part rollback linked list.
2. The synchronization method of claim 1, wherein constructing a partial rollback interval [ x, y ] using the target operation number x and the target variable y, and adding the partial rollback interval [ x, y ] to a partial rollback list comprises:
constructing a partial rollback interval [ x, y ] by adopting the target operation number x and the target variable y;
adding the partial rollback intervals [ x, y ] into a partial rollback list according to the sequence from small to large of the target operation number x;
judging whether the newly added partial rollback interval [ x, y ] and the existing partial rollback interval [ x, y ] are adjacent intervals or not;
if the sections are adjacent sections, combining the newly added partial rollback sections [ x, y ] with the existing partial rollback sections [ x, y ] to obtain combined partial rollback sections;
updating a variable y by a value obtained by subtracting 1 from the initial value x of the combined partial rollback interval;
if the value is not the adjacent section, the variable y is updated by the value obtained by subtracting 1 from the starting value x of the newly added partial rollback section.
3. The synchronization method according to claim 2, wherein adjacent intervals refer to the y value of the previous interval plus 1 and the x value of the subsequent interval being equal.
4. The synchronization method according to claim 1, characterized in that the synchronization method further comprises:
and when the operation is the rollback operation, deleting the transaction cache file corresponding to the rollback operation, and releasing all the operations cached in the memory.
5. The synchronization method of claim 1, wherein adding the DML operation and the operation number to the corresponding transaction cache file, wherein updating the variable y to be equal to the operation number of the current DML operation, and wherein updating the log sequence number of the inventory LSN to be equal to the current DML operation, comprises:
firstly, storing the DML operation and the operation number in a corresponding memory;
judging whether the buffer critical point is reached;
if the cache critical point is reached, compressing all DML operations in the memory to obtain compressed data, and adding the compressed data and part of rollback chain table interval information into a corresponding transaction cache file;
the update variable y is equal to the operation number of the current DML operation, and the update storage LSN is equal to the log sequence number of the current DML operation.
6. The synchronization method according to claim 1, wherein the performing data synchronization by the execution thread according to the operation number of the operation to be performed and the corresponding partial rollback table includes:
after receiving a transaction to be executed, the execution thread takes out an operation to be executed from a corresponding transaction cache file and acquires an operation number z of the operation to be executed;
sequentially extracting partial rollback intervals [ x, y ] from the partial rollback linked list;
and determining whether to perform partial rollback operation according to the relative relation between the operation number z and the partial rollback interval [ x, y ] so as to perform data synchronization.
7. The synchronization method according to claim 6, wherein determining whether to perform the partial rollback operation based on the relative relationship between the operation number z and the partial rollback interval [ x, y ] comprises:
judging whether the operation number z is smaller than an interval starting value x or not;
if the operation number z is smaller than the interval starting value x, executing the operation to be executed, and taking out the next operation to be executed;
if the operation number z is not smaller than the section starting value x, judging whether the operation number z is larger than the section ending value y or not;
if the operation number z is not greater than the interval termination value y, the operation to be executed belongs to the operation in the partial rollback interval, discarding the operation to be executed, and taking out the next operation to be executed.
8. The synchronization method according to claim 7, characterized in that the synchronization method further comprises:
and if the operation number z is larger than the interval termination value y, taking out the next partial rollback interval [ x, y ], and determining whether to perform partial rollback operation or not according to the relative relation between the operation number z and the next partial rollback interval [ x, y ] until the traversal of all the partial rollback intervals is completed.
9. The synchronization method according to claim 8, characterized in that the synchronization method further comprises:
if the operation number z is larger than the section termination value y of the last partial rollback section, executing the operation to be executed;
and directly executing the operation to be executed after the next operation to be executed is fetched.
10. A synchronization system, wherein the synchronization system comprises at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being programmed to perform the synchronization method of any one of claims 1-9.
CN202011056091.2A 2020-09-30 2020-09-30 Synchronization method and synchronization system based on log analysis Active CN112307117B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011056091.2A CN112307117B (en) 2020-09-30 2020-09-30 Synchronization method and synchronization system based on log analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011056091.2A CN112307117B (en) 2020-09-30 2020-09-30 Synchronization method and synchronization system based on log analysis

Publications (2)

Publication Number Publication Date
CN112307117A CN112307117A (en) 2021-02-02
CN112307117B true CN112307117B (en) 2023-12-12

Family

ID=74488248

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011056091.2A Active CN112307117B (en) 2020-09-30 2020-09-30 Synchronization method and synchronization system based on log analysis

Country Status (1)

Country Link
CN (1) CN112307117B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114385752B (en) * 2021-12-15 2025-03-28 武汉达梦数据库股份有限公司 A method and device for data synchronization operation numbering
CN115718786B (en) * 2022-11-30 2025-06-17 武汉达梦数据库股份有限公司 A method and device for log parsing and synchronous transaction storage
CN119493634A (en) * 2024-11-15 2025-02-21 武汉达梦数据库股份有限公司 A method, device and system for setting transaction operation dependency based on transaction status
CN119473520B (en) * 2024-11-15 2025-09-30 武汉达梦数据库股份有限公司 Method, device and system for collecting conflicting TRXID setting operation dependencies during rollback

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10452648B1 (en) * 2015-12-07 2019-10-22 Gravic, Inc. Method of ensuring transactional integrity of a system that includes a plurality of subsystems, one of which takes an action upon a loss of transactional integrity
KR20200056357A (en) * 2020-03-17 2020-05-22 주식회사 실크로드소프트 Technique for implementing change data capture in database management system
CN111694893A (en) * 2020-04-23 2020-09-22 武汉达梦数据库有限公司 Partial rollback analysis method based on log analysis and data synchronization system
CN111694798A (en) * 2020-04-23 2020-09-22 武汉达梦数据库有限公司 Data synchronization method and data synchronization system based on log analysis

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8145686B2 (en) * 2005-05-06 2012-03-27 Microsoft Corporation Maintenance of link level consistency between database and file system
US20180144015A1 (en) * 2016-11-18 2018-05-24 Microsoft Technology Licensing, Llc Redoing transaction log records in parallel
US10698921B2 (en) * 2017-02-28 2020-06-30 Sap Se Persistence and initialization of synchronization state for serialized data log replay in database systems

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10452648B1 (en) * 2015-12-07 2019-10-22 Gravic, Inc. Method of ensuring transactional integrity of a system that includes a plurality of subsystems, one of which takes an action upon a loss of transactional integrity
KR20200056357A (en) * 2020-03-17 2020-05-22 주식회사 실크로드소프트 Technique for implementing change data capture in database management system
CN111694893A (en) * 2020-04-23 2020-09-22 武汉达梦数据库有限公司 Partial rollback analysis method based on log analysis and data synchronization system
CN111694798A (en) * 2020-04-23 2020-09-22 武汉达梦数据库有限公司 Data synchronization method and data synchronization system based on log analysis

Also Published As

Publication number Publication date
CN112307117A (en) 2021-02-02

Similar Documents

Publication Publication Date Title
CN112307117B (en) Synchronization method and synchronization system based on log analysis
CN111694800B (en) Method for improving data synchronization performance and data synchronization system
CN110262929B (en) A method for ensuring the consistency of replicated transactions and a corresponding replication device
CN107038162B (en) Real-time data query method and system based on database log
US8868512B2 (en) Logging scheme for column-oriented in-memory databases
CN111177161B (en) Data processing method, device, computing equipment and storage medium
CN111694863B (en) Database cache refreshing method, system and device
CN106709043A (en) Data synchronous loading method based on database log
CN111221907B (en) Database added column synchronization method and device based on log analysis
CN111241094B (en) A method and device for database deletion column synchronization based on log parsing
CN111177254B (en) Method and device for data synchronization between heterogeneous relational databases
CN111694893A (en) Partial rollback analysis method based on log analysis and data synchronization system
CN111858501B (en) Log reading method based on log analysis synchronization and data synchronization system
CN111694798B (en) Data synchronization method and data synchronization system based on log analysis
CN114297216B (en) Data synchronization method and device, computer storage medium and electronic equipment
CN112559626A (en) Synchronous method and synchronous system of DDL operation based on log analysis
US12158887B2 (en) On-board data storage method and system
CN105138691A (en) Method and system for analyzing user traffic
CN112307118B (en) Method for guaranteeing data consistency based on log analysis synchronization and synchronization system
CN111858504A (en) Operation merging execution method based on log analysis synchronization and data synchronization system
CN111930828B (en) Data synchronization method and data synchronization system based on log analysis
CN111221909B (en) Database modification column synchronization method and device based on log analysis
CN114528049B (en) A method and system for implementing API call information statistics based on InfluxDB
CN112115166A (en) Data caching method and device, computer equipment and storage medium
CN115422286A (en) Data synchronization method and device for distributed database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 430000 16-19 / F, building C3, future technology building, 999 Gaoxin Avenue, Donghu New Technology Development Zone, Wuhan, Hubei Province

Applicant after: Wuhan dream database Co.,Ltd.

Address before: 430000 16-19 / F, building C3, future technology building, 999 Gaoxin Avenue, Donghu New Technology Development Zone, Wuhan, Hubei Province

Applicant before: WUHAN DAMENG DATABASE Co.,Ltd.

CB02 Change of applicant information
CB03 Change of inventor or designer information

Inventor after: Sun Feng

Inventor after: Peng Qingsong

Inventor after: Liu Qichun

Inventor before: Sun Feng

Inventor before: Fu Quan

Inventor before: Peng Qingsong

Inventor before: Liu Qichun

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载