CN115470290A - Increment synchronization method and device based on materialized view logs and computer equipment - Google Patents
Increment synchronization method and device based on materialized view logs and computer equipment Download PDFInfo
- Publication number
- CN115470290A CN115470290A CN202210911027.0A CN202210911027A CN115470290A CN 115470290 A CN115470290 A CN 115470290A CN 202210911027 A CN202210911027 A CN 202210911027A CN 115470290 A CN115470290 A CN 115470290A
- Authority
- CN
- China
- Prior art keywords
- incremental
- data
- operations
- materialized view
- increment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
- G06F16/24573—Query processing with adaptation to user needs using data annotations, e.g. user-defined metadata
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Library & Information Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明涉及数据同步技术领域,公开了一种基于物化视图日志的增量同步方法、装置以及计算机设备。其中,该方法包括:获取源数据库中的数据源表,创建数据源表对应的物化视图日志,该物化视图日志包括多个属性列,属性列用于确定数据源表的增量操作所涉及的字段和更新值;当产生针对于数据源表的增量操作时,基于数据源表的元数据确定增量操作对应的字段;当增量操作对应的字段处于属性列中,则从物化视图日志中提取增量操作对应的增量数据;将增量数据同步至目标数据库。通过实施本发明,减少了回表查询的次数,提升了增量数据的同步效率,提高了数据库增量同步的性能。
The invention relates to the technical field of data synchronization, and discloses an incremental synchronization method, device and computer equipment based on materialized view logs. Wherein, the method includes: obtaining the data source table in the source database, creating a materialized view log corresponding to the data source table, the materialized view log includes a plurality of attribute columns, and the attribute columns are used to determine the data source table involved in the incremental operation Fields and update values; when an incremental operation is generated for the data source table, the field corresponding to the incremental operation is determined based on the metadata of the data source table; when the field corresponding to the incremental operation is in the attribute column, the materialized view log Extract the incremental data corresponding to the incremental operation; synchronize the incremental data to the target database. Through the implementation of the invention, the number of table return queries is reduced, the synchronization efficiency of incremental data is improved, and the performance of database incremental synchronization is improved.
Description
技术领域technical field
本发明涉及数据同步技术领域,具体涉及一种基于物化视图日志的增量同步方法、装置以及计算机设备。The present invention relates to the technical field of data synchronization, in particular to an incremental synchronization method, device and computer equipment based on materialized view logs.
背景技术Background technique
数据库的增量同步需要从源数据库中抽取增量数据,经过分析和转换,编写成适用于目标数据库的数据库脚本文件和操作语言,以将增量数据存放至目标数据库中。Incremental synchronization of the database needs to extract the incremental data from the source database, analyze and convert it, and write it into a database script file and operating language suitable for the target database, so as to store the incremental data in the target database.
Oracle数据库为常用的数据库类型之一,也是数据库同步组件必须支持的数据库类型。目前,针对于Oracle数据库的增量数据同步,在基于物化视图日志技术进行数据同步时,需要读取物化视图日志,然后根据物化视图日志中的主键到数据源表中进行回表查询,从数据源表中获取完整的增量数据。然而,物化视图日志中的每条日志数据都需要进行回表查询,由于网络传输和数据库性能的影响,导致增量数据的抽取效率较低,进而影响了增量数据的同步效率。The Oracle database is one of the commonly used database types, and it is also the database type that must be supported by the database synchronization component. At present, for the incremental data synchronization of Oracle database, when data synchronization is performed based on the materialized view log technology, it is necessary to read the materialized view log, and then query the data source table according to the primary key in the materialized view log. The complete incremental data is obtained from the source table. However, each log data in the materialized view log needs to be queried back to the table. Due to the influence of network transmission and database performance, the extraction efficiency of incremental data is low, which in turn affects the synchronization efficiency of incremental data.
发明内容Contents of the invention
有鉴于此,本发明实施例提供了一种基于物化视图日志的增量同步方法、装置以及计算机设备,以解决增量数据的抽取效率较低,导致增量数据的同步效率较低的问题。In view of this, the embodiments of the present invention provide a materialized view log-based incremental synchronization method, device, and computer equipment to solve the problem of low incremental data extraction efficiency resulting in low incremental data synchronization efficiency.
根据第一方面,本发明实施例提供了一种基于物化视图日志的增量同步方法,包括:获取源数据库中的数据源表,创建所述数据源表对应的物化视图日志,所述物化视图日志包括多个属性列,所述属性列用于确定所述数据源表的增量操作所涉及的字段和更新值;当产生针对于所述数据源表的增量操作时,基于所述数据源表的元数据确定所述增量操作对应的字段;当所述增量操作对应的字段处于所述属性列中,则从所述物化视图日志中提取所述增量操作对应的增量数据;将所述增量数据同步至目标数据库。According to the first aspect, an embodiment of the present invention provides an incremental synchronization method based on a materialized view log, including: obtaining a data source table in a source database, creating a materialized view log corresponding to the data source table, and the materialized view The log includes a plurality of attribute columns, and the attribute columns are used to determine the fields and update values involved in the incremental operation of the data source table; when an incremental operation for the data source table is generated, based on the data The metadata of the source table determines the field corresponding to the incremental operation; when the field corresponding to the incremental operation is in the attribute column, extract the incremental data corresponding to the incremental operation from the materialized view log ; Synchronize the incremental data to the target database.
本发明实施例提供的基于物化视图日志的增量同步方法,通过在数据源表对应的物化视图日志中指定多个属性列,在产生增量操作时,结合多个属性列以及数据源表的元数据即可确定出增量操作所对应的字段。若该字段处于属性列中则从物化视图日志中提取增量数据,由此无需回表查询,减少了回表查询的次数,提升了增量数据的同步效率,提高了数据库增量同步的性能。In the incremental synchronization method based on the materialized view log provided by the embodiment of the present invention, by specifying multiple attribute columns in the materialized view log corresponding to the data source table, when generating an incremental operation, combine multiple attribute columns and data source table The metadata can determine the field corresponding to the incremental operation. If the field is in the attribute column, the incremental data is extracted from the materialized view log, thus eliminating the need to return to the table for query, reducing the number of table return queries, improving the synchronization efficiency of incremental data, and improving the performance of database incremental synchronization .
结合第一方面,在第一方面的第一实施方式中,多个所述属性列包括字段变更向量列、增量操作类型列以及数据列,所述当产生针对于所述数据源表的增量操作时,确定所述增量操作对应的字段,包括:当产生针对于所述数据源表的增量操作时,将所述增量操作更新在所述增量操作类型列;基于所述增量操作类型列、所述字段变更向量列以及所述数据列,确定对应于所述增量操作的字段。With reference to the first aspect, in the first implementation manner of the first aspect, the plurality of attribute columns include a field change vector column, an increment operation type column, and a data column, and when generating an increment for the data source table When performing an incremental operation, determining the field corresponding to the incremental operation includes: when generating an incremental operation for the data source table, updating the incremental operation in the incremental operation type column; based on the The incremental operation type column, the field change vector column and the data column determine the field corresponding to the incremental operation.
本发明实施例提供的基于物化视图日志的增量同步方法,将增量操作更新在增量操作类型列,以便根据增量操作类型、字段变更向量列以及数据列确定出产生数据更新的字段,即从物化视图日志中确定出产生增量操作的字段。由此能够根据增量操作确认发生更新的数据,无需返回数据源表进行数据查询,减少增量数据的回表查询时间,提高了数据同步效率。In the incremental synchronization method based on the materialized view log provided by the embodiment of the present invention, the incremental operation is updated in the incremental operation type column, so as to determine the field generating the data update according to the incremental operation type, the field change vector column, and the data column. That is, the field that generates the incremental operation is determined from the materialized view log. In this way, the updated data can be confirmed according to the incremental operation, and there is no need to return to the data source table for data query, which reduces the query time of incremental data back to the table, and improves the efficiency of data synchronization.
结合第一方面第一实施方式,在第一方面的第二实施方式中,所述基于所述增量操作类型列、所述字段变更向量列以及所述数据列,确定对应于所述增量操作的字段,包括:基于所述物化视图的主键信息,确定相同主键信息对应的增量操作;当所述增量操作存在多个时,基于预设规则对多个所述增量操作进行合并,得到目标增量操作;基于所述字段变更向量列、所述目标增量操作以及所述数据列,确定所述目标增量操作在所述物化视图日志中的字段。With reference to the first implementation of the first aspect, in the second implementation of the first aspect, based on the increment operation type column, the field change vector column, and the data column, determine the The field of operation, including: based on the primary key information of the materialized view, determine the incremental operation corresponding to the same primary key information; when there are multiple incremental operations, merge multiple incremental operations based on preset rules , to obtain a target incremental operation; based on the field change vector column, the target incremental operation, and the data column, determine the field of the target incremental operation in the materialized view log.
本发明实施例提供的基于物化视图日志的增量同步方法,通过将多个增量操作进行合并,便于确认增量数据的最终状态,进一步减少了回表查询次数和增量数据的规模,便于后续操作。The incremental synchronization method based on the materialized view log provided by the embodiment of the present invention facilitates confirmation of the final state of the incremental data by merging multiple incremental operations, and further reduces the number of queries back to the table and the scale of the incremental data, facilitating Follow up.
结合第一方面第二实施方式,在第一方面的第三实施方式中,所述增量操作的类型包括插入操作、更新操作和删除操作,所述基于预设规则对多个所述增量操作进行合并,得到目标增量操作,包括:当多个所述增量操作依次为插入操作和更新操作时,基于所述预设规则合并多个所述增量操作,确定所述目标增量操作为插入操作;当多个所述增量操作依次为插入操作和删除操作时,基于所述预设规则合并所述增量操作,确定所述目标增量操作为删除操作;当多个所述增量操作均为更新操作时,基于所述预设规则合并所述增量操作,确定所述目标增量操作为更新操作;当多个所述增量操作依次为更新操作和删除操作时,基于所述预设规则合并所述增量操作,确定所述目标增量操作为删除操作;当多个所述增量操作依次为删除操作和插入操作时,基于所述预设规则合并所述增量操作,确定所述目标增量操作为插入操作。With reference to the second implementation manner of the first aspect, in the third implementation manner of the first aspect, the type of the incremental operation includes an insert operation, an update operation, and a delete operation, and the multiple incremental operations based on preset rules Combining operations to obtain a target incremental operation, including: when multiple incremental operations are insert operations and update operations in sequence, merging multiple incremental operations based on the preset rules to determine the target incremental operation The operation is an insert operation; when multiple incremental operations are insert operations and delete operations in sequence, merge the incremental operations based on the preset rules, and determine that the target incremental operation is a delete operation; when multiple incremental operations When the incremental operations are all update operations, merge the incremental operations based on the preset rules, and determine that the target incremental operation is an update operation; when multiple incremental operations are update operations and delete operations in sequence merging the incremental operations based on the preset rules, determining that the target incremental operation is a delete operation; when multiple incremental operations are delete operations and insert operations in sequence, merging the incremental operations based on the preset rules the incremental operation, and determine that the target incremental operation is an insert operation.
本发明实施例提供的基于物化视图日志的增量同步方法,通过预设规则按照多个增量操作的产生顺序进行合并,保证针对于同一主键的多个增量操作能够进行合并,最大限度地减少了回表查询的次数。The incremental synchronization method based on the materialized view log provided by the embodiment of the present invention merges multiple incremental operations according to the generation order of multiple incremental operations through preset rules, so as to ensure that multiple incremental operations for the same primary key can be merged, maximizing Reduced the number of queries back to the table.
结合第一方面,在第一方面的第四实施方式中,在所述基于所述数据源表的元数据确定所述增量操作对应的字段之后,还包括:判断所述增量操作对应的字段是否为大数据字段;当所述增量操作对应的字段非所述大数据字段时,判断所述增量操作对应的字段是否处于所述属性列中。With reference to the first aspect, in the fourth implementation manner of the first aspect, after determining the field corresponding to the incremental operation based on the metadata of the data source table, it further includes: judging the field corresponding to the incremental operation Whether the field is a large data field; when the field corresponding to the incremental operation is not the large data field, determine whether the field corresponding to the incremental operation is in the attribute column.
本发明实施例提供的基于物化视图日志的增量同步方法,对于非大数据字段的增量操作,通过确定增量操作对应的字段以减少非大数据字段的回表查询次数,对不包含大数据字段的数据源表,或针对大数据字段读取较多写入较少的场景,提升了增量数据的抽取性能。In the incremental synchronization method based on materialized view logs provided by the embodiment of the present invention, for incremental operations on non-big data fields, the number of query times for non-big data fields is reduced by determining the fields corresponding to the incremental operations. The data source table of the data field, or the scene where the large data field reads more and writes less, improves the extraction performance of incremental data.
结合第一方面,在第一方面的第五实施方式中,所述获取源数据库中的数据源表,创建所述数据源表对应的物化视图日志,包括:获取所述数据源表的元数据,所述元数据包括数据主键、数据名称以及数据类型;基于所述元数据的数据主键、数据名称以及数据类型,生成对应于所述元数据的物化视图日志。With reference to the first aspect, in the fifth implementation manner of the first aspect, the acquiring the data source table in the source database and creating the materialized view log corresponding to the data source table includes: acquiring metadata of the data source table , the metadata includes a data primary key, a data name, and a data type; based on the data primary key, data name, and data type of the metadata, a materialized view log corresponding to the metadata is generated.
本发明实施例提供的基于物化视图日志的增量同步方法,通过元数据对应的数据主键、数据名称以及数据类型,构建相应的物化视图日志,以便准确定位至产生增量操作的数据。The incremental synchronization method based on the materialized view log provided by the embodiment of the present invention constructs the corresponding materialized view log through the data primary key, data name and data type corresponding to the metadata, so as to accurately locate the data that generates the incremental operation.
结合第一方面,在第一方面的第六实施方式中,在将所述增量数据同步至目标数据库之后,还包括:对所述物化视图日志进行数据清理。With reference to the first aspect, in the sixth implementation manner of the first aspect, after synchronizing the incremental data to the target database, further includes: performing data cleaning on the materialized view log.
本发明实施例提供的基于物化视图日志的增量同步方法,在完成增量数据的同步之后,从物化视图日志中清理已完成数据同步的日志数据,释放数据空间,节省了数据占用空间。In the incremental synchronization method based on the materialized view log provided by the embodiment of the present invention, after the synchronization of the incremental data is completed, the log data that has been synchronized is cleared from the materialized view log to release the data space and save the space occupied by the data.
根据第二方面,本发明实施例提供了一种基于物化视图日志的增量同步装置,包括:获取模块,用于获取源数据库中的数据源表,创建所述数据源表对应的物化视图日志,所述物化视图日志包括属性列,所述属性列用于所述数据源表的元数据的更新;第一确定模块,用于当产生针对于所述数据源表的增量操作时,确定所述增量操作对应的字段;提取模块,用于当所述增量操作对应的字段处于所述属性列中,则从所述物化视图日志中提取所述增量操作对应的增量数据;同步模块,用于将所述增量数据同步至目标数据库。According to the second aspect, an embodiment of the present invention provides an incremental synchronization device based on a materialized view log, including: an acquisition module, configured to acquire a data source table in a source database, and create a materialized view log corresponding to the data source table , the materialized view log includes an attribute column, and the attribute column is used for updating the metadata of the data source table; the first determination module is configured to determine when an incremental operation for the data source table is generated The field corresponding to the incremental operation; an extraction module, configured to extract the incremental data corresponding to the incremental operation from the materialized view log when the field corresponding to the incremental operation is in the attribute column; A synchronization module, configured to synchronize the incremental data to the target database.
根据第三方面,本发明实施例提供了一种计算机设备,包括:存储器和处理器,所述存储器和所述处理器之间互相通信连接,所述存储器中存储有计算机指令,所述处理器通过执行所述计算机指令,从而执行第一方面或第一方面任一实施方式所述的基于物化视图日志的增量同步方法。According to a third aspect, an embodiment of the present invention provides a computer device, including: a memory and a processor, the memory and the processor are connected to each other in communication, the memory stores computer instructions, and the processor By executing the computer instructions, the materialized view log-based incremental synchronization method described in the first aspect or any implementation manner of the first aspect is executed.
根据第四方面,本发明实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机指令,所述计算机指令用于使计算机执行第一方面或第一方面任一实施方式所述的基于物化视图日志的增量同步方法。According to the fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, the computer-readable storage medium stores computer instructions, and the computer instructions are used to make the computer execute the first aspect or any implementation of the first aspect. The incremental synchronization method based on the materialized view log described in the method.
需要说明的是,本发明实施例提供的基于物化视图日志的增量同步装置、计算机设备以及计算机可读存储介质的相应有益效果,请参见基于物化视图日志的增量同步方法中相应内容的描述,在此不再赘述。It should be noted that, for the corresponding beneficial effects of the materialized view log-based incremental synchronization device, computer equipment, and computer-readable storage medium provided by the embodiment of the present invention, please refer to the description of the corresponding content in the materialized view log-based incremental synchronization method. , which will not be repeated here.
附图说明Description of drawings
为了更清楚地说明本发明具体实施方式或现有技术中的技术方案,下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施方式,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the specific implementation of the present invention or the technical solutions in the prior art, the following will briefly introduce the accompanying drawings that need to be used in the specific implementation or description of the prior art. Obviously, the accompanying drawings in the following description The drawings show some implementations of the present invention, and those skilled in the art can obtain other drawings based on these drawings without any creative work.
图1示出了本发明实施例中数据同步系统的架构图;Fig. 1 shows the architecture diagram of the data synchronization system in the embodiment of the present invention;
图2是本发明实施例的基于物化视图日志的增量同步方法的流程图;Fig. 2 is the flowchart of the incremental synchronization method based on the materialized view log of the embodiment of the present invention;
图3是本发明实施例的基于物化视图日志的增量同步方法的另一流程图;Fig. 3 is another flowchart of the incremental synchronization method based on the materialized view log according to the embodiment of the present invention;
图4是本发明实施例的基于物化视图日志的增量同步方法的又一流程图;Fig. 4 is another flowchart of the incremental synchronization method based on the materialized view log according to the embodiment of the present invention;
图5是本发明实施例的基于物化视图日志的增量同步装置的结构框图;FIG. 5 is a structural block diagram of an incremental synchronization device based on a materialized view log according to an embodiment of the present invention;
图6是本发明实施例提供的计算机设备的硬件结构示意图。FIG. 6 is a schematic diagram of a hardware structure of a computer device provided by an embodiment of the present invention.
具体实施方式detailed description
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without making creative efforts belong to the protection scope of the present invention.
目前,对于数据库增量数据的同步,需要从源数据库中抽取增量数据,经过分析和转换,编写成适用于目标数据库的数据库脚本文件和操作语言,以将增量数据存放至目标数据库中。然而,Oracle数据库为常用的数据库类型之一,Oracle数据库的同步依赖于物化视图日志,根据物化视图日志中的主键到数据源表中进行回表查询,从数据源表中获取完整的增量数据。然而,物化视图日志中的每条日志数据都需要进行回表查询,由于网络传输和数据库性能的影响,导致增量数据的抽取效率较低,进而影响了增量数据的同步效率。At present, for the synchronization of database incremental data, it is necessary to extract incremental data from the source database, analyze and convert it, and write it into a database script file and operating language suitable for the target database, so as to store the incremental data in the target database. However, the Oracle database is one of the commonly used database types. The synchronization of the Oracle database depends on the materialized view log. According to the primary key in the materialized view log, the table is returned to the data source table to obtain complete incremental data from the data source table. . However, each log data in the materialized view log needs to be queried back to the table. Due to the influence of network transmission and database performance, the extraction efficiency of incremental data is low, which in turn affects the synchronization efficiency of incremental data.
基于此,本技术方案根据物化视图日志与数据源表之间的操作特征,从物化视图日志中提取增量数据,由此无需回表查询,减少了回表查询的次数,提升了增量数据的同步效率,提高了数据库增量同步的性能。Based on this, this technical solution extracts incremental data from the materialized view log according to the operational characteristics between the materialized view log and the data source table, thereby eliminating the need for back-to-table queries, reducing the number of back-to-table queries, and improving incremental data. The synchronization efficiency improves the performance of database incremental synchronization.
本发明实施例提供一种数据同步系统,如图1所示,该数据同步系统包括:源数据库、目标数据库、管理节点集群、同步节点集群以及协调器集群。在进行数据同步时,从源数据库中抽取增量数据,经过管理节点集群、同步节点集群以及协调器集群对增量数据进行分析和转换,编写成适用于目标库数据库的操作语言,并将增量数据存放至目标数据库中。An embodiment of the present invention provides a data synchronization system. As shown in FIG. 1 , the data synchronization system includes: a source database, a target database, a management node cluster, a synchronization node cluster, and a coordinator cluster. During data synchronization, the incremental data is extracted from the source database, and the incremental data is analyzed and converted through the management node cluster, synchronization node cluster, and coordinator cluster, and written into an operating language suitable for the target database database. The volume data is stored in the target database.
其中,源数据库可以为分布式关系型数据库,可以为分布式文件系统,还可以为非结构化数据库,此处对数据库的类型不作具体限定。Wherein, the source database may be a distributed relational database, a distributed file system, or an unstructured database, and the type of the database is not specifically limited here.
其中,目标数据库可以为分布式关系型数据库,可以为分布式文件系统,还可以为非结构化数据库,此处对数据库的类型不作具体限定。Wherein, the target database may be a distributed relational database, a distributed file system, or an unstructured database, and the type of the database is not specifically limited here.
其中,管理节点集群用于数据的核对配置,并将数据核对配置信息推送同步节点集群。同时接收同步节点集群反馈的数据同步状态、进度等信息。Among them, the management node cluster is used for data verification configuration, and the data verification configuration information is pushed to the synchronization node cluster. At the same time, it receives information such as data synchronization status and progress fed back by the synchronization node cluster.
其中,同步节点集群用于执行具体的数据同步过程。Wherein, the synchronization node cluster is used to execute a specific data synchronization process.
其中,协调器集群用于协调源数据至目标数据库的数据同步。Among them, the coordinator cluster is used to coordinate the data synchronization from the source data to the target database.
根据本发明实施例,提供了一种基于物化视图日志的增量同步的实施例,需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。According to an embodiment of the present invention, an embodiment of incremental synchronization based on a materialized view log is provided. It should be noted that the steps shown in the flow chart of the accompanying drawings can be implemented in a computer system such as a set of computer-executable instructions and, although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that shown or described herein.
在本实施例中提供了一种基于物化视图日志的增量同步方法,可用于上述数据同步系统中源数据库与目标数据库之间的数据同步,本实施针对源数据库为关系型数据库Oracle,目标数据库为任意关系型数据库,非关系型数据库或者消息队列等。图2是根据本发明实施例的基于物化视图日志的增量同步方法的流程图,如图2所示,该流程包括如下步骤:In this embodiment, an incremental synchronization method based on materialized view logs is provided, which can be used for data synchronization between the source database and the target database in the above-mentioned data synchronization system. In this implementation, the source database is a relational database Oracle, and the target database For any relational database, non-relational database or message queue, etc. Fig. 2 is the flow chart of the incremental synchronization method based on the materialized view log according to the embodiment of the present invention, as shown in Fig. 2, the process includes the following steps:
S11,获取源数据库中的数据源表,创建数据源表对应的物化视图日志。其中,物化视图日志包括多个属性列,该属性列用于确定所述数据源表的增量操作所涉及的字段和更新值。S11. Obtain a data source table in the source database, and create a materialized view log corresponding to the data source table. Wherein, the materialized view log includes a plurality of attribute columns, and the attribute columns are used to determine the fields and update values involved in the incremental operation of the data source table.
数据源表为源数据库中用于记录源数据的数据表,物化视图日志是用来记录数据源表更新操作(即增加、删除和修改)的日志表。源数据库中通常包括有多个数据源表,不同的数据源表对应于不同的物化视图日志,即每个数据源表均对应有一个物化视图日志,且占用一定的存储空间。The data source table is a data table used to record source data in the source database, and the materialized view log is a log table used to record update operations (ie, add, delete, and modify) of the data source table. The source database usually includes multiple data source tables, and different data source tables correspond to different materialized view logs, that is, each data source table corresponds to a materialized view log and occupies a certain amount of storage space.
具体地,在获取到数据源表之后,创建相应的物化视图日志。在创建数据源表所对应的物化视图日志表时,自动生成针对于每条日志记录的rowid。同时,在创建物化视图日志的过程中,可以指定物化视图日志的属性列,通过属性列对数据源表中源数据的更新进行表征,该属性列可以包括filter columns、sequence、including new values、dmltype、oldnew、changevector等。Specifically, after the data source table is obtained, a corresponding materialized view log is created. When creating the materialized view log table corresponding to the data source table, the rowid for each log record is automatically generated. At the same time, in the process of creating a materialized view log, you can specify the attribute column of the materialized view log, and use the attribute column to represent the update of the source data in the data source table. The attribute column can include filter columns, sequence, including new values, dmltype , oldnew, changevector, etc.
需要说明的是,filter columns用于记录数据源表中的列值,可以是数据源表中非大数据(LOB)类型的任意字段组合;dmltype表示增量操作类型,可以是插入操作INSERT(I)、更新操作UPDATE(U)、删除操作DELETE(D);including new values表示包含被更新的数据;oldnew表示该行日志是新值还是旧值;changevector表示发生数据更新的列,用raw(255)保存,被更新的列在相应的二进制位标识为1;sequence表示增量操作的顺序。当然还可以根据实际需要指定其他属性或者特征,此处不作具体限定。It should be noted that the filter columns are used to record the column values in the data source table, which can be any combination of non-big data (LOB) fields in the data source table; dmltype indicates the type of incremental operation, which can be an insert operation INSERT(I ), update operation UPDATE(U), delete operation DELETE(D); including new values indicates that the updated data is included; oldnew indicates whether the row log is a new value or an old value; changevector indicates the column where the data update occurs, and raw(255 ) is saved, and the updated column is marked as 1 in the corresponding binary bit; sequence indicates the order of the incremental operation. Of course, other attributes or features may also be specified according to actual needs, which are not specifically limited here.
S12,当产生针对于数据源表的增量操作时,确定增量操作对应的字段。S12. When generating an incremental operation on the data source table, determine a field corresponding to the incremental operation.
数据同步系统可以实时监测针对于数据源表是否产生了增量操作。当产生针对于数据源表的增量操作时,物化视图日志会跟随数据源表的增量操作而联动刷新。通过查询数据源表的元数据,可以得到数据源表的字段信息(包括名称、类型、在数据源表中的顺序等)。物化视图日志中的changevector列能够标记数据源表中发生更改的字段。此处通过比对元数据中字段信息与changevector列,即可确定出增量操作对应的字段。即通过检测物化视图日志表中的changevector列即可推断出物化视图日志中产生增量操作所对应的字段。The data synchronization system can monitor in real time whether incremental operations have occurred on the data source table. When an incremental operation on the data source table is generated, the materialized view log will be refreshed in conjunction with the incremental operation on the data source table. By querying the metadata of the data source table, the field information (including name, type, order in the data source table, etc.) of the data source table can be obtained. The changevector column in the materialized view log can mark the changed fields in the data source table. Here, by comparing the field information in the metadata with the changevector column, the field corresponding to the incremental operation can be determined. That is, by detecting the changevector column in the materialized view log table, the field corresponding to the incremental operation in the materialized view log can be deduced.
S13,当增量操作对应的字段处于属性列中,则从物化视图日志中提取增量操作对应的增量数据。S13. When the field corresponding to the incremental operation is in the attribute column, extract the incremental data corresponding to the incremental operation from the materialized view log.
物化视图日志中的filter columns记录了数据源表中的列值,将产生增量操作的字段与属性列中的filter columns所包含字段进行对比,以确定增量操作对应的字段是否处于filter columns中。若增量操作对应的字段处于filter columns中,表示增量操作对应的增量数据能够直接从物化视图日志中组装获取,此时,无需回到数据源表进行增量数据的查询,直接从物化视图日志中提取增量操作对应的增量数据即可。The filter columns in the materialized view log record the column values in the data source table, and compare the field that generates the incremental operation with the fields contained in the filter columns in the attribute column to determine whether the field corresponding to the incremental operation is in the filter columns . If the field corresponding to the incremental operation is in the filter columns, it means that the incremental data corresponding to the incremental operation can be assembled and obtained directly from the materialized view log. At this time, there is no need to return to the data source table to query the incremental data, and the The incremental data corresponding to the incremental operation can be extracted from the view log.
S14,将增量数据同步至目标数据库。S14, synchronizing the incremental data to the target database.
数据库同步系统从物化视图日志中抽取到增量操作对应的增量数据后,将增量数据编写为适用于目标数据库的操作语言,以将该增量数据同步至目标数据库,无需再次回表查询即可对目标数据库中的数据进行增量更新。After the database synchronization system extracts the incremental data corresponding to the incremental operation from the materialized view log, it writes the incremental data into an operating language suitable for the target database, so as to synchronize the incremental data to the target database without going back to the table for query again The data in the target database can be incrementally updated.
本实施例提供的基于物化视图日志的增量同步方法,通过在数据源表对应的物化视图日志中指定多个属性列,在产生增量操作时,结合多个属性列以及数据源表的元数据即可确定出增量操作所对应的字段。若该字段处于属性列中则从物化视图日志中提取增量数据,由此无需回表查询,减少了回表查询的次数,提升了增量数据的同步效率,提高了数据库增量同步的性能。In the incremental synchronization method based on the materialized view log provided in this embodiment, by specifying multiple attribute columns in the materialized view log corresponding to the data source table, when generating an incremental operation, combine multiple attribute columns and metadata of the data source table The data can determine the field corresponding to the incremental operation. If the field is in the attribute column, the incremental data is extracted from the materialized view log, thus eliminating the need to return to the table for query, reducing the number of table return queries, improving the synchronization efficiency of incremental data, and improving the performance of database incremental synchronization .
在本实施例中提供了一种基于物化视图日志的增量同步方法,可用于上述数据同步系统中源数据库与目标数据库之间的数据同步。本实施针对源数据库为关系型数据库Oracle,目标数据库为任意关系型数据库,非关系型数据库或者消息队列等。图3是根据本发明实施例的基于物化视图日志的增量同步方法的流程图,如图3所示,该流程包括如下步骤:In this embodiment, an incremental synchronization method based on materialized view logs is provided, which can be used for data synchronization between the source database and the target database in the above data synchronization system. In this implementation, the source database is a relational database Oracle, and the target database is any relational database, non-relational database, or message queue. Fig. 3 is a flowchart of an incremental synchronization method based on a materialized view log according to an embodiment of the present invention. As shown in Fig. 3 , the process includes the following steps:
S21,获取源数据库中的数据源表,创建数据源表对应的物化视图日志。其中,物化视图日志包括多个属性列,该属性列用于确定所述数据源表的增量操作所涉及的字段和更新值。S21. Obtain a data source table in the source database, and create a materialized view log corresponding to the data source table. Wherein, the materialized view log includes a plurality of attribute columns, and the attribute columns are used to determine the fields and update values involved in the incremental operation of the data source table.
详细说明参见上述实施例对应的相关描述,此处不再赘述。For details, refer to relevant descriptions corresponding to the foregoing embodiments, and details are not repeated here.
S22,当产生针对于数据源表的增量操作时,确定增量操作对应的字段。S22. When generating an incremental operation on the data source table, determine a field corresponding to the incremental operation.
具体地,多个属性列包括字段变更向量列、增量操作类型列以及数据列,通过字段变更向量列用来表示数据源表当次增量操作所涉及的字段。增量操作类型列用于记录当次增量操作的类型,包括插入操作、修改以及删除操作等。通过数据列记录数据源表中非大数据字段的名称以及最新字段值。Specifically, the multiple attribute columns include a field change vector column, an incremental operation type column, and a data column, and the field change vector column is used to indicate the fields involved in the current incremental operation of the data source table. The incremental operation type column is used to record the type of the current incremental operation, including insert, modify, and delete operations. Record the name of the non-big data field in the data source table and the latest field value through the data column.
相应地,上述步骤S22可以包括:Correspondingly, the above step S22 may include:
S221,当产生针对于数据源表的增量操作时,将增量操作更新在增量操作类型列。S221. When generating an incremental operation for the data source table, update the incremental operation in the incremental operation type column.
在检测到数据源表产生增量操作时,物化视图日志会跟随数据源表中的源数据更新而更新,并确定对应于当前源数据更新的增量操作类型,例如插入新数据、更新某一行数据的值等,并将该增量操作记录在物化视图日志的增量操作类型列。When an incremental operation is detected on the data source table, the materialized view log will be updated following the update of the source data in the data source table, and determine the incremental operation type corresponding to the current source data update, such as inserting new data, updating a row The value of the data, etc., and record the incremental operation in the incremental operation type column of the materialized view log.
S222,基于增量操作类型列、字段变更向量列以及数据列,确定对应于增量操作的字段。S222. Based on the incremental operation type column, the field change vector column and the data column, determine the field corresponding to the incremental operation.
根据物化视图日志的字段变更向量列可以得知数据源表中发生数据更新的位置。根据增量操作类型列可以得知数据更新方式,例如插入新数据、删除数据以及更新数据值等。通过查询物化视图日志的增量操作类型列、字段变更向量列以及非大数据字段的数据列即可确定当前发生增量操作的字段。According to the field change vector column of the materialized view log, the location where the data update occurs in the data source table can be known. According to the incremental operation type column, you can know the data update method, such as inserting new data, deleting data, and updating data values. By querying the incremental operation type column, field change vector column, and data column of non-big data fields in the materialized view log, you can determine the field where the incremental operation currently occurs.
此处以一具体示例进行说明,如表1所示的数据源表,建立对应于表1的物化视图日志。继而对表1进行如下增量操作:A specific example is used here to illustrate, as shown in Table 1, a materialized view log corresponding to Table 1 is created. Then perform the following incremental operations on Table 1:
1)插入id=1的数据;1) Insert data with id=1;
2)修改id=2的price字段。2) Modify the price field with id=2.
经过上述增量操作后的物化视图日志如表2所示。其中,dmltype指示了增量操作的类型,即增量操作类型列;oldnew指示了该行日志是新值(N)还是旧值(O);changvector用于指示发生字段变更的数据列,即字段变更向量列。Table 2 shows the materialized view logs after the above incremental operations. Among them, dmltype indicates the type of incremental operation, that is, the incremental operation type column; oldnew indicates whether the row log is a new value (N) or an old value (O); changvector is used to indicate the data column where the field change occurs, that is, the field Alter the vector columns.
表1数据源表示例Table 1 Data source table example
表2物化视图日志示例Table 2 Materialized view log example
需要说明的是,字段变更向量列changevector采用二进制,每个bit位的0和1表示对应列是否有变更,比如第一例有变更就是00000010,第二列有变更是00000100,第三列有变更就是00001000,另外采取小端模式转换成16进制,分别是0200,0400和0800。插入操作对应的changevector为FEFF。删除操作对应的changevector为0000。It should be noted that the field change vector column changevector uses binary, and the 0 and 1 of each bit indicate whether the corresponding column has changed. For example, if there is a change in the first example, it is 00000010, if the second column is changed, it is 00000100, and if the third column is changed It is 00001000, and the little endian mode is used to convert it into hexadecimal, which are 0200, 0400 and 0800 respectively. The changevector corresponding to the insert operation is FEFF. The changevector corresponding to the delete operation is 0000.
作为一个可选的实施方式,上述步骤S222可以包括:As an optional implementation manner, the above step S222 may include:
(1)基于物化视图日志的主键信息,确定相同主键信息对应的增量操作。(1) Based on the primary key information of the materialized view log, determine the incremental operation corresponding to the same primary key information.
在数据源表与物化视图日志的关系中,通过主键信息(primary key)来表示物化视图日志对应于数据源表的数据记录,即通过主键信息用于唯一标识每条数据记录。In the relationship between the data source table and the materialized view log, the primary key information (primary key) is used to indicate that the materialized view log corresponds to the data record of the data source table, that is, the primary key information is used to uniquely identify each data record.
根据物化视图日志的主键信息可以唯一对应数据源表中的数据记录。不同的主键信息对应于不同的数据记录,通过物化视图日志的各主键信息可以唯一确定出针对于各主键信息所对应数据产生的增量操作。According to the primary key information of the materialized view log, it can uniquely correspond to the data record in the data source table. Different primary key information corresponds to different data records, and the incremental operations for the data corresponding to each primary key information can be uniquely determined through the primary key information of the materialized view log.
(2)当增量操作存在多个时,基于预设规则对多个增量操作进行合并,得到目标增量操作。(2) When there are multiple incremental operations, the multiple incremental operations are combined based on preset rules to obtain the target incremental operation.
预设规则为根据多个增量操作的产生顺序所预先设定的合并规则。当针对于相同主键信息的数据产生多次增量操作时,为了便于最终增量操作的确定,可以根据预设规则将各个增量操作进行合并,得到多个增量操作合并之后的目标增量操作,并将目标增量操作更新至增量操作类型列dmltype。The preset rule is a merge rule preset according to the generation sequence of multiple incremental operations. When multiple incremental operations are generated for data with the same primary key information, in order to facilitate the determination of the final incremental operation, each incremental operation can be combined according to the preset rules to obtain the target increment after the merger of multiple incremental operations operation, and update the target incremental operation to the incremental operation type column dmltype.
具体地,增量操作的类型包括插入操作、更新操作和删除操作,此处以2个增量操作为例,上述步骤(2)可以包括:Specifically, the type of incremental operation includes an insert operation, an update operation, and a delete operation. Taking two incremental operations as an example, the above step (2) may include:
(21)当多个增量操作依次为插入操作和更新操作时,基于预设规则合并多个增量操作,确定目标增量操作为插入操作。(21) When multiple incremental operations are insert operations and update operations in sequence, multiple incremental operations are merged based on preset rules, and the target incremental operation is determined to be an insert operation.
(22)当多个增量操作依次为插入操作和删除操作时,基于预设规则合并增量操作,确定目标增量操作为删除操作。(22) When multiple incremental operations are insertion operations and deletion operations in sequence, the incremental operations are combined based on preset rules, and the target incremental operation is determined to be a deletion operation.
(23)当多个增量操作均为更新操作时,基于预设规则合并增量操作,确定目标增量操作为更新操作。(23) When multiple incremental operations are update operations, merge the incremental operations based on preset rules, and determine the target incremental operation as the update operation.
(24)当多个增量操作依次为更新操作和删除操作时,基于预设规则合并增量操作,确定目标增量操作为删除操作。(24) When multiple incremental operations are update operations and delete operations in turn, merge the incremental operations based on preset rules, and determine the target incremental operation as the delete operation.
(25)当多个增量操作依次为删除操作和插入操作时,基于预设规则合并增量操作,确定目标增量操作为插入操作。(25) When multiple incremental operations are deletion operations and insertion operations in sequence, the incremental operations are combined based on preset rules, and the target incremental operation is determined to be an insertion operation.
对于上述增量操作的合并可以表示为:The merging of the above incremental operations can be expressed as:
1)INSERT+UPDATE=INSERT;1) INSERT+UPDATE=INSERT;
2)INSERT+DELETE=DELETE;2) INSERT+DELETE=DELETE;
3)UPDATE+UPDATE=UPDATE;3) UPDATE+UPDATE=UPDATE;
4)UPDATE+DELETE=DELETE;4) UPDATE+DELETE=DELETE;
5)DELETE+INSERT=INSERT。5) DELETE+INSERT=INSERT.
具体地,在创建物化视图日志时,指定记录增量操作顺序的属性sequence。对于两个以上的增量操作的合并,其可以依次进行两两合并,得到目标增量操作。以3个为例进行说明。当3个增量操作依次为删除操作、插入操作和更新操作时,首先删除操作与插入操作合并,得到插入操作;然后,插入操作与更新操作合并,得到插入操作,即目标增量操作为插入操作。Specifically, when creating a materialized view log, specify the attribute sequence that records the sequence of incremental operations. For the merging of more than two incremental operations, pairwise merging can be performed sequentially to obtain the target incremental operation. Take 3 examples for illustration. When the three incremental operations are delete operation, insert operation and update operation in turn, the delete operation is first merged with the insert operation to obtain the insert operation; then, the insert operation is merged with the update operation to obtain the insert operation, that is, the target incremental operation is insert operate.
(3)基于字段变更向量列、目标增量操作以及数据列,确定目标增量操作在物化视图日志中的字段。(3) Based on the field change vector column, the target incremental operation and the data column, determine the field of the target incremental operation in the materialized view log.
根据字段变更向量列可以得知数据源表中发生数据更新的源数据所处位置,继而根据目标增量操作可以得知源数据的最终更新值。目标增量操作记录在增量操作类型列,由此通过查询物化视图日志的增量操作类型列、字段变更向量列以及非大数据字段的数据列即可确定当前发生增量操作的字段。According to the field change vector column, the location of the source data where data update occurs in the data source table can be known, and then the final update value of the source data can be known according to the target incremental operation. The target incremental operation is recorded in the incremental operation type column, so the field where the current incremental operation occurs can be determined by querying the incremental operation type column, field change vector column, and non-big data field data column of the materialized view log.
根据增量操作的合并结果判断是否需要回表查询,具体地,增量操作合并后的操作类型有三种情况:According to the merged result of the incremental operation, it is judged whether the query needs to be returned to the table. Specifically, there are three types of operations after the incremental operation is merged:
1)合并后是插入操作,如果物化视图日志的属性列filter columns中包含所有数据源表的字段,则无需回表,否则需要回表查询。1) After merging, it is an insert operation. If the attribute column filter columns of the materialized view log contains all the fields of the data source table, there is no need to return to the table, otherwise it is necessary to return to the table for query.
2)合并后是更新操作,依据各次更新操作依次合并多次更新的字段,若更新字段包含在属性列filter columns中,则无需回表;否则需要回表查询。2) After merging, there is an update operation. The fields that have been updated multiple times are merged in turn according to each update operation. If the updated fields are included in the attribute column filter columns, there is no need to return to the table; otherwise, it is necessary to return to the table for query.
3)合并后是删除操作,则根据主键信息进行删除即可,无需回表。3) After the merge is a delete operation, it can be deleted according to the primary key information without returning to the table.
此处以一具体示例进行说明,如表1所示的数据源表,在建立完对应于表1的物化视图日志之后,对表1中的元数据进行如表3所示的多次增量操作。Here is a specific example for illustration. For the data source table shown in Table 1, after the materialized view log corresponding to Table 1 is created, multiple incremental operations are performed on the metadata in Table 1 as shown in Table 3. .
表3多次增量操作示例Table 3 Example of multiple incremental operations
在一批物化视图日志数据中,对相同主键信息的日志数据进行增量操作的合并,得到如图4所示的物化视图日志。In a batch of materialized view log data, the log data with the same primary key information are merged incrementally to obtain the materialized view log as shown in Figure 4.
表4增量操作合并后的物化视图日志示例Table 4 Example of materialized view logs after incremental operations are merged
S23,当增量操作对应的字段处于属性列中,则从物化视图日志中提取增量操作对应的增量数据。S23. When the field corresponding to the incremental operation is in the attribute column, extract the incremental data corresponding to the incremental operation from the materialized view log.
详细说明参见上述实施例对应的相关描述,此处不再赘述。For details, refer to relevant descriptions corresponding to the foregoing embodiments, and details are not repeated here.
S24,将增量数据同步至目标数据库。S24, synchronizing the incremental data to the target database.
详细说明参见上述实施例对应的相关描述,此处不再赘述。For details, refer to relevant descriptions corresponding to the foregoing embodiments, and details are not repeated here.
本实施例提供的基于物化视图日志的增量同步方法,将增量操作更新在增量操作类型列,以便根据增量操作类型、字段变更向量列以及数据列确定出产生数据更新的字段,即从物化视图日志中确定出产生增量操作的字段。由此能够根据增量操作确认发生更新的数据,无需返回数据源表进行数据查询,减少增量数据的回表查询时间,提高了数据同步效率。通过将多个增量操作进行合并,便于确认增量数据的最终状态,最大限度地减少了回表查询的次数,降低了增量数据的规模,便于后续操作。In the incremental synchronization method based on the materialized view log provided in this embodiment, the incremental operation is updated in the incremental operation type column, so as to determine the field that generates the data update according to the incremental operation type, the field change vector column, and the data column, that is, From the materialized view log, determine the field that generated the incremental operation. In this way, the updated data can be confirmed according to the incremental operation, and there is no need to return to the data source table for data query, which reduces the query time of incremental data back to the table, and improves the efficiency of data synchronization. By merging multiple incremental operations, it is easy to confirm the final state of the incremental data, minimize the number of query back to the table, reduce the scale of incremental data, and facilitate subsequent operations.
在本实施例中提供了一种基于物化视图日志的增量同步方法,可用于上述数据同步系统中源数据库与目标数据库之间的数据同步。本实施针对源数据库为关系型数据库Oracle,目标数据库为任意关系型数据库,非关系型数据库或者消息队列等。图4是根据本发明实施例的基于物化视图日志的增量同步方法的流程图,如图4所示,该流程包括如下步骤:In this embodiment, an incremental synchronization method based on materialized view logs is provided, which can be used for data synchronization between the source database and the target database in the above data synchronization system. In this implementation, the source database is a relational database Oracle, and the target database is any relational database, non-relational database, or message queue. Fig. 4 is a flowchart of an incremental synchronization method based on a materialized view log according to an embodiment of the present invention. As shown in Fig. 4, the process includes the following steps:
S31,获取源数据库中的数据源表,创建数据源表对应的物化视图日志。其中,物化视图日志包括多个属性列,该属性列用于确定所述数据源表的增量操作所涉及的字段和更新值。S31. Obtain a data source table in the source database, and create a materialized view log corresponding to the data source table. Wherein, the materialized view log includes a plurality of attribute columns, and the attribute columns are used to determine the fields and update values involved in the incremental operation of the data source table.
作为一个可选的实施方式,上述步骤可以包括:As an optional implementation manner, the above steps may include:
S311,获取数据源表的元数据,该元数据包括数据主键、数据名称以及数据类型。S311. Obtain metadata of a data source table, where the metadata includes a data primary key, a data name, and a data type.
数据同步系统读取源数据库中待同步的数据源表,并读取数据源表中的元数据,包括数据源表的数据主键、数据名称以及数据类型,还可以包括数据位置,当然还可以包括其他信息,此处不作具体限定。The data synchronization system reads the data source table to be synchronized in the source database, and reads the metadata in the data source table, including the data primary key, data name and data type of the data source table, and can also include the data location, of course, it can also include Other information is not specifically limited here.
其中,数据主键用于标识数据唯一性,例如上述表1所示例的数据标识ID;数据名称表示每列数据的名称,例如上述表1所示例的name;数据类型表示数据的类型,如lob类型、text类型等,例如上述表1所示例的des(clob);数据位置表示数据表在源数据库中的索引位置。Among them, the data primary key is used to identify the uniqueness of the data, such as the data identification ID shown in the above table 1; the data name represents the name of each column of data, such as the name shown in the above table 1; the data type represents the type of data, such as the lob type , text type, etc., such as des(clob) shown in Table 1 above; the data position indicates the index position of the data table in the source database.
S312,基于元数据的数据主键、数据名称以及数据类型,生成对应于元数据的物化视图日志。S312. Based on the data primary key, data name and data type of the metadata, generate a materialized view log corresponding to the metadata.
根据基于元数据的数据主键、数据名称以及数据类型,确定该元数据所描述的数据源表的数据以及数据源表的字段信息等。继而基于元数据对应的数据源表的数据,调用相应的语法创建物化视图日志,并指定物化视图日志的多个属性列,包含dmltype、oldnew、changevector、sequence以及filter columns等特性。According to the data primary key, data name and data type based on the metadata, the data of the data source table described by the metadata and the field information of the data source table are determined. Then, based on the data in the data source table corresponding to the metadata, call the corresponding syntax to create a materialized view log, and specify multiple attribute columns of the materialized view log, including dmltype, oldnew, changevector, sequence, and filter columns.
需要说明的是,对于filter columns而言,其可以包含有多个字段,例如数据标识id、数据名称name以及其他数据信息等。It should be noted that, for filter columns, it may contain multiple fields, such as data identifier id, data name name, and other data information.
S32当产生针对于数据源表的增量操作时,确定增量操作对应的字段。S32: When generating an incremental operation on the data source table, determine a field corresponding to the incremental operation.
详细说明参见上述实施例对应的相关描述,此处不再赘述。For details, refer to relevant descriptions corresponding to the foregoing embodiments, and details are not repeated here.
S33,判断增量操作对应的字段是否为大数据字段。S33. Determine whether the field corresponding to the incremental operation is a large data field.
大数据字段表示该增量操作对应的数据为lob类型的数据。物化视图日志针对非lob字段的数据更新,对于lob字段的数据发生更新时,仍需要进行回表查询。The large data field indicates that the data corresponding to the incremental operation is lob type data. Materialized view logs are updated for data in non-lob fields. When data in lob fields is updated, table query still needs to be performed.
在检测到数据源表产生增量操作时,判断发生增量操作的字段是否为大数据字段,当增量操作对应的字段非大数据字段时,执行步骤S34,否则进行回表查询。When an incremental operation is detected in the data source table, it is judged whether the field in which the incremental operation occurs is a large data field, and when the field corresponding to the incremental operation is not a large data field, step S34 is executed; otherwise, the query is performed back to the table.
S34,判断增量操作对应的字段是否处于属性列中。S34. Determine whether the field corresponding to the increment operation is in the attribute column.
当增量操作对应的字段非大数据字段时,将增量操作对应的字段与属性列中所包含的各个字段进行对比,判断增量操作对应的字段是否处于属性列的filter columns中。当增量操作对应的字段处于属性列的filter columns中,则执行后续步骤S35,否则进行回表查询。When the field corresponding to the incremental operation is not a big data field, compare the field corresponding to the incremental operation with the fields contained in the attribute column to determine whether the field corresponding to the incremental operation is in the filter columns of the attribute column. When the field corresponding to the incremental operation is in the filter columns of the attribute column, then perform the subsequent step S35, otherwise perform a query back to the table.
具体地,数据同步系统可以依次分析物化视图日志中的每行数据,依次判断产生增量操作的数据是否需要回表查询,并对增量操作对应的增量数据进行统计。待完成数据汇总之后,对于需要回表查询的数据进行回表查询,对于无需回表查询的数据,则直接从物化视图日志中进行抽取。Specifically, the data synchronization system can sequentially analyze each row of data in the materialized view log, sequentially determine whether the data generated by the incremental operation needs to be returned to the table for query, and count the incremental data corresponding to the incremental operation. After the data summary is completed, query back to the table for the data that needs to be returned to the table for query, and directly extract from the materialized view log for the data that does not need to be returned to the table for query.
S35,当增量操作对应的字段处于属性列中,则从物化视图日志中提取增量操作对应的增量数据。S35. When the field corresponding to the incremental operation is in the attribute column, extract the incremental data corresponding to the incremental operation from the materialized view log.
详细说明参见上述实施例对应的相关描述,此处不再赘述。For details, refer to relevant descriptions corresponding to the foregoing embodiments, and details are not repeated here.
S36,将增量数据同步至目标数据库。S36, synchronizing the incremental data to the target database.
详细说明参见上述实施例对应的相关描述,此处不再赘述。For details, refer to relevant descriptions corresponding to the foregoing embodiments, and details are not repeated here.
S37,对物化视图日志进行数据清理。S37. Perform data cleaning on the materialized view log.
在目标数据库完成增量数据的入库后,从物化视图日志中清理已完成数据同步的日志数据,并继续等待下一次增量操作所产生的日志数据。当物化视图日志中存在新增的日志数据时,则返回步骤S32执行增量数据的同步过程。After the target database completes the storage of incremental data, clear the log data that has completed data synchronization from the materialized view log, and continue to wait for the log data generated by the next incremental operation. When there is newly added log data in the materialized view log, return to step S32 to execute the incremental data synchronization process.
本实施例提供的基于物化视图日志的增量同步方法,通过元数据对应的数据主键、数据名称以及数据类型,构建相应的物化视图日志,以便准确定位至产生增量操作的数据。对于非大数据字段的增量操作,通过确定增量操作对应的字段以减少非大数据字段的回表查询次数,对不包含大数据字段的数据源表,或针对大数据字段读取较多写入较少的场景,提升了增量数据的抽取性能。在完成增量数据的同步之后,从物化视图日志中清理已完成数据同步的日志数据,释放数据空间,节省了数据占用空间。The incremental synchronization method based on the materialized view log provided by this embodiment constructs the corresponding materialized view log through the data primary key, data name and data type corresponding to the metadata, so as to accurately locate the data that generates the incremental operation. For incremental operations on non-big data fields, reduce the number of table return queries for non-big data fields by determining the fields corresponding to the incremental operations. For data source tables that do not contain large data fields, or read more for large data fields Write less scenes and improve the performance of incremental data extraction. After the synchronization of the incremental data is completed, the log data that has been synchronized is cleared from the materialized view log to release the data space and save the space occupied by the data.
在本实施例中还提供了一种基于物化视图日志的增量同步装置,该装置用于实现上述实施例及优选实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。In this embodiment, an incremental synchronization device based on materialized view logs is also provided, which is used to implement the above-mentioned embodiments and preferred implementation modes, and those that have already been explained will not be repeated. As used below, the term "module" may be a combination of software and/or hardware that realizes a predetermined function. Although the devices described in the following embodiments are preferably implemented in software, implementations in hardware, or a combination of software and hardware are also possible and contemplated.
本实施例提供一种基于物化视图日志的增量同步装置,如图4所示,包括:This embodiment provides an incremental synchronization device based on materialized view logs, as shown in Figure 4, including:
获取模块41,用于获取源数据库中的数据源表,创建数据源表对应的物化视图日志。其中,物化视图日志包括多个属性列,该属性列用于确定所述数据源表的增量操作所涉及的字段和更新值。The obtaining
第一确定模块42,用于当产生针对于数据源表的增量操作时,确定增量操作对应的字段。The first determining
提取模块43,用于当增量操作对应的字段处于属性列中,则从物化视图日志中提取增量操作对应的增量数据。The
同步模块44,用于将增量数据同步至目标数据库。Synchronization module 44, configured to synchronize the incremental data to the target database.
作为一个可选地实施方式,多个属性列包括字段变更向量列、增量操作类型列以及数据列,上述第一确定模块42可以包括:As an optional implementation manner, the multiple attribute columns include a field change vector column, an incremental operation type column, and a data column, and the above-mentioned
更新子模块,用于当产生针对于数据源表的增量操作时,将增量操作更新在增量操作类型列。The update sub-module is used to update the incremental operation in the incremental operation type column when the incremental operation for the data source table is generated.
第一确定子模块,用于基于增量操作类型列、字段变更向量列以及数据列,确定对应于增量操作的字段。The first determining submodule is used to determine the field corresponding to the incremental operation based on the incremental operation type column, the field change vector column and the data column.
作为一个可选的实施方式,上述第二确定子模块可以包括:As an optional implementation manner, the above-mentioned second determining submodule may include:
获取子单元,用于基于物化视图日志的主键信息,确定相同主键信息对应的多个增量操作。The acquisition subunit is used to determine multiple incremental operations corresponding to the same primary key information based on the primary key information of the materialized view log.
合并子单元,用于当增量操作存在多个时,基于预设规则对多个增量操作进行合并,得到目标增量操作。The merging subunit is used for merging multiple incremental operations based on preset rules to obtain a target incremental operation when there are multiple incremental operations.
确定子单元,用于基于字段变更向量列、目标增量操作以及数据列,确定目标增量操作在物化视图日志中的字段。Determine the subunit, used to determine the field in the materialized view log of the target incremental operation based on the field change vector column, the target incremental operation, and the data column.
具体地,增量操作的类型包括插入操作、更新操作和删除操作,上述合并子单元用于:Specifically, the type of incremental operation includes insert operation, update operation and delete operation, and the above-mentioned merging subunit is used for:
当多个增量操作依次为插入操作和更新操作时,基于预设规则合并多个增量操作,确定目标增量操作为插入操作。When multiple incremental operations are insert operations and update operations in sequence, the multiple incremental operations are combined based on preset rules, and the target incremental operation is determined to be an insert operation.
当多个增量操作依次为插入操作和删除操作时,基于预设规则合并增量操作,确定目标增量操作为删除操作。When multiple incremental operations are insertion operations and deletion operations in sequence, the incremental operations are combined based on preset rules, and the target incremental operation is determined to be a deletion operation.
当多个增量操作均为更新操作时,基于预设规则合并增量操作,确定目标增量操作为更新操作。When multiple incremental operations are update operations, the incremental operations are combined based on preset rules, and the target incremental operation is determined to be an update operation.
当多个增量操作依次为更新操作和删除操作时,基于预设规则合并增量操作,确定目标增量操作为删除操作。When multiple incremental operations are an update operation and a delete operation in sequence, the incremental operations are combined based on preset rules, and the target incremental operation is determined to be a delete operation.
当多个增量操作依次为删除操作和插入操作时,基于预设规则合并增量操作,确定目标增量操作为插入操作。When multiple incremental operations are deletion operations and insertion operations in sequence, the incremental operations are merged based on preset rules, and the target incremental operation is determined to be an insertion operation.
作为一个可选的实施方式,上述获取模块41可以包括:As an optional implementation manner, the
第一获取子模块,用于获取数据源表的元数据,该元数据包括数据主键、数据名称以及数据类型。The first obtaining sub-module is used to obtain metadata of the data source table, and the metadata includes data primary key, data name and data type.
生成子模块,用于,基于元数据的数据主键、数据名称以及数据类型,生成对应于元数据的物化视图日志。The generation sub-module is used to, based on the data primary key, data name and data type of the metadata, generate a materialized view log corresponding to the metadata.
作为一个可选的实施方式,上述基于物化视图日志的增量同步装置还可以包括:As an optional implementation manner, the above-mentioned incremental synchronization device based on materialized view logs may also include:
第一判断模块,用于判断增量操作对应的字段是否为大数据字段。The first judging module is used to judge whether the field corresponding to the incremental operation is a large data field.
第二判断模块,用于当增量操作对应的字段非大数据字段时,判断增量操作对应的字段是否处于属性列中。The second judging module is used for judging whether the field corresponding to the incremental operation is in the attribute column when the field corresponding to the incremental operation is not a large data field.
作为一个可选的实施方式,上述基于物化视图日志的增量同步装置还可以包括:As an optional implementation manner, the above-mentioned incremental synchronization device based on materialized view logs may also include:
清理模块,用于对物化视图日志进行数据清理。The cleaning module is used for data cleaning of materialized view logs.
本实施例中的基于物化视图日志的增量同步装置是以功能单元的形式来呈现,这里的单元是指ASIC电路,执行一个或多个软件或固定程序的处理器和存储器,和/或其他可以提供上述功能的器件。The materialized view log-based incremental synchronization device in this embodiment is presented in the form of a functional unit, where a unit refers to an ASIC circuit, a processor and a memory that execute one or more software or fixed programs, and/or other A device that can provide the above functions.
上述各模块的更进一步的功能描述与上述对应实施例相同,在此不再赘述。The further functional descriptions of the above modules are the same as those in the corresponding embodiments above, and will not be repeated here.
本发明实施例提供的基于物化视图日志的增量同步装置,通过在数据源表对应的物化视图日志中指定属性列,在产生增量操作时,确定增量操作所对应的字段。若该字段处于属性列中则从物化视图日志中提取增量数据,由此无需回表查询,减少了回表查询的次数,提升了增量数据的同步效率,提高了数据库增量同步的性能。In the incremental synchronization device based on the materialized view log provided by the embodiment of the present invention, by specifying the attribute column in the materialized view log corresponding to the data source table, when an incremental operation is generated, the field corresponding to the incremental operation is determined. If the field is in the attribute column, the incremental data is extracted from the materialized view log, thus eliminating the need to return to the table for query, reducing the number of table return queries, improving the synchronization efficiency of incremental data, and improving the performance of database incremental synchronization .
本发明实施例还提供一种计算机设备,该计算机设备中设置有源数据库或目标数据库,具有上述图5所示的基于物化视图日志的增量同步装置。An embodiment of the present invention also provides a computer device, which is provided with a source database or a target database, and has the above-mentioned incremental synchronization device based on the materialized view log shown in FIG. 5 .
请参阅图6,图6是本发明可选实施例提供的一种计算机设备的结构示意图,如图6所示,该计算机设备可以包括:至少一个处理器501,例如中央处理器(Central ProcessingUnit,CPU),至少一个通信接口503,存储器504,至少一个通信总线502。其中,通信总线502用于实现这些组件之间的连接通信。其中,通信接口503可以包括显示屏(Display)、键盘(Keyboard),可选通信接口503还可以包括标准的有线接口、无线接口。存储器504可以是高速易挥发性随机存取存储器(Random Access Memory,RAM),也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。存储器504可选的还可以是至少一个位于远离前述处理器501的存储装置。其中处理器501可以结合图5所描述的装置,存储器504中存储应用程序,且处理器501调用存储器504中存储的程序代码,以用于执行上述任一方法步骤。Please refer to FIG. 6. FIG. 6 is a schematic structural diagram of a computer device provided in an optional embodiment of the present invention. As shown in FIG. 6, the computer device may include: at least one
其中,通信总线502可以是外设部件互连标准(peripheral componentinterconnect,PCI)总线或扩展工业标准结构(extended industry standardarchitecture,EISA)总线等。通信总线502可以分为地址总线、数据总线、控制总线等。为便于表示,图6中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。Wherein, the
其中,存储器504可以包括易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM);存储器也可以包括非易失性存储器(non-volatilememory),例如快闪存储器(flash memory),硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD);存储器504还可以包括上述种类的存储器的组合。Wherein, the
其中,处理器501可以是中央处理器(central processing unit,CPU),网络处理器(network processor,NP)或者CPU和NP的组合。Wherein, the
其中,处理器501还可以进一步包括硬件芯片。上述硬件芯片可以是专用集成电路(application-specific integrated circuit,ASIC),可编程逻辑器件(programmablelogic device,PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(complexprogrammable logic device,CPLD),现场可编程逻辑门阵列(field-programmable gatearray,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。Wherein, the
可选地,存储器504还用于存储程序指令。处理器501可以调用程序指令,实现如本申请图2至图4实施例中所示的基于物化视图日志的增量同步方法。Optionally, the
本发明实施例还提供了一种非暂态计算机存储介质,所述计算机存储介质存储有计算机可执行指令,该计算机可执行指令可执行上述任意方法实施例中的基于物化视图日志的增量同步方法。其中,所述存储介质可为磁碟、光盘、只读存储记忆体(Read-OnlyMemory,ROM)、随机存储记忆体(Random Access Memory,RAM)、快闪存储器(FlashMemory)、硬盘(Hard Disk Drive,缩写:HDD)或固态硬盘(Solid-State Drive,SSD)等;所述存储介质还可以包括上述种类的存储器的组合。An embodiment of the present invention also provides a non-transitory computer storage medium, the computer storage medium stores computer-executable instructions, and the computer-executable instructions can perform incremental synchronization based on materialized view logs in any of the above method embodiments method. Wherein, the storage medium can be a magnetic disk, an optical disk, a read-only memory (Read-OnlyMemory, ROM), a random access memory (Random Access Memory, RAM), a flash memory (FlashMemory), a hard disk (Hard Disk Drive) , abbreviation: HDD) or a solid-state hard drive (Solid-State Drive, SSD), etc.; the storage medium may also include a combination of the above-mentioned types of memory.
虽然结合附图描述了本发明的实施例,但是本领域技术人员可以在不脱离本发明的精神和范围的情况下做出各种修改和变型,这样的修改和变型均落入由所附权利要求所限定的范围之内。Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art can make various modifications and variations without departing from the spirit and scope of the present invention. within the bounds of the requirements.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210911027.0A CN115470290A (en) | 2022-07-29 | 2022-07-29 | Increment synchronization method and device based on materialized view logs and computer equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210911027.0A CN115470290A (en) | 2022-07-29 | 2022-07-29 | Increment synchronization method and device based on materialized view logs and computer equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115470290A true CN115470290A (en) | 2022-12-13 |
Family
ID=84366614
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210911027.0A Pending CN115470290A (en) | 2022-07-29 | 2022-07-29 | Increment synchronization method and device based on materialized view logs and computer equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115470290A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116662354A (en) * | 2023-02-11 | 2023-08-29 | 嘉实远见科技(北京)有限公司 | A Method of Real-time Incremental Extraction of Database View Data |
CN118394771A (en) * | 2024-05-27 | 2024-07-26 | 浪潮卓数大数据产业发展有限公司 | Database data updating synchronization method, device, equipment and medium |
-
2022
- 2022-07-29 CN CN202210911027.0A patent/CN115470290A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116662354A (en) * | 2023-02-11 | 2023-08-29 | 嘉实远见科技(北京)有限公司 | A Method of Real-time Incremental Extraction of Database View Data |
CN118394771A (en) * | 2024-05-27 | 2024-07-26 | 浪潮卓数大数据产业发展有限公司 | Database data updating synchronization method, device, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109508355B (en) | A data extraction method, system and terminal device | |
WO2020073854A1 (en) | Memory data management method and system and method and system for data maintenance in memory | |
CN107301214B (en) | Data migration method and device in HIVE and terminal equipment | |
CN107273506A (en) | A kind of method of database multi-list conjunctive query | |
CN111209344A (en) | Data synchronization method and device | |
CN107977396B (en) | A kind of updating method and table data updating device of data table of KeyValue database | |
CN110569243B (en) | Data query method, data query plug-in and data query server | |
CN115470290A (en) | Increment synchronization method and device based on materialized view logs and computer equipment | |
CN105373541A (en) | Processing method and system for data operation request of database | |
CN113946565B (en) | Method and system for supporting heterogeneous data database and table migration | |
CN113282579B (en) | Heterogeneous data storage and retrieval method, device, equipment and storage medium | |
CN109388659B (en) | Data storage method, device and computer readable storage medium | |
CN112783927A (en) | Database query method and system | |
US20180349443A1 (en) | Edge store compression in graph databases | |
KR101588375B1 (en) | Method and system for managing database | |
CN114637762A (en) | A method, device and electronic device for generating an operation record | |
CN114153857A (en) | Data synchronization method, data synchronization apparatus, and computer-readable storage medium | |
CN111651531B (en) | Data importing method, device, equipment and computer storage medium | |
CN115455023B (en) | A method and device for processing view data table | |
CN107609068B (en) | Data non-inductive migration method | |
CN115994148B (en) | Multi-table data updating method and device, electronic equipment and readable storage medium | |
CN112527911B (en) | Data storage method, device, equipment and medium | |
CN114116907A (en) | A database synchronization method, device, electronic device and storage medium | |
CN115510051A (en) | Data processing method, query method, device and electronic equipment | |
CN110633271A (en) | Data extraction method from Hbase to mysql based on json |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |