CN103902585A - Data loading method and system - Google Patents
Data loading method and system Download PDFInfo
- Publication number
- CN103902585A CN103902585A CN201210580016.5A CN201210580016A CN103902585A CN 103902585 A CN103902585 A CN 103902585A CN 201210580016 A CN201210580016 A CN 201210580016A CN 103902585 A CN103902585 A CN 103902585A
- Authority
- CN
- China
- Prior art keywords
- data block
- data
- stress state
- bleeding point
- mark
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/568—Storing data temporarily at an intermediate stage, e.g. caching
- H04L67/5682—Policies or rules for updating, deleting or replacing the stored data
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本申请涉及数据处理技术领域,尤其涉及一种数据加载方法和系统。The present application relates to the technical field of data processing, and in particular to a data loading method and system.
背景技术Background technique
在互联网、通信等领域中,常常需要将大批量的数据加载到指定的数据仓库中。图1是目前常用的数据加载系统组成示意图。In fields such as the Internet and communications, it is often necessary to load a large amount of data into a designated data warehouse. Figure 1 is a schematic diagram of the composition of a commonly used data loading system at present.
如图1所示,目前的数据加载系统包括主节点101、代理服务节点102、收集点103和数据仓库104,每个代理服务节点102都绑定特定的收集点103,例如,在图1所示例子中,代理服务节点A和代理服务节点B都和收集点A绑定,代理服务节点C和代理服务节点D都和收集点B绑定。As shown in Figure 1, the current data loading system includes a
主节点101,用于启动或停止代理服务节点102以及收集点103。The
代理服务节点102包括存储模块和代理服务模块,存储模块用于存储需要加载的数据,代理服务模块用于读取所述存储模块中需要加载的数据,将所述需要加载的数据发给与该代理服务节点102绑定的收集点103。The
收集点103,用于将接收的数据通过数据仓库104提供的接口写入到数据仓库104,从而实现数据加载。The
其中,代理服务模块在从存储模块读取需要加载的数据以后,按照一定的格式解析需要加载的数据,由于需要加载的数据一般都是文件类型,为了保证数据加载过程中不会丢失数据,代理服务模块对需要加载的数据文件进行解析后不进行任何其他处理而直接重命名为隐藏文件,然后将所述隐藏文件直接发给与该代理服务模块所在的代理服务节点102绑定的收集点103。Among them, after the proxy service module reads the data to be loaded from the storage module, it parses the data to be loaded according to a certain format. Since the data to be loaded is generally a file type, in order to ensure that the data will not be lost during the data loading process, the proxy The service module parses the data file to be loaded without any other processing and directly renames it as a hidden file, and then directly sends the hidden file to the
在实际应用中,数据加载系统有时会出现一些异常情况,例如,数据加载系统中的某个或某些收集点103出现故障。In practical applications, some abnormal situations sometimes occur in the data loading system, for example, one or some
目前,当收集点103出现故障时,与该收集点103绑定的代理服务节点102在向该收集点103发送数据时将会出错,该代理服务节点102从而得知与其自身绑定的收集点103出现了故障。At present, when the
由于目前的数据加载系统中代理服务节点102和收集点103之间是绑定的关系,即每个代理服务节点102绑定到特定的收集点103上,只能通过绑定的收集点103实现数据加载,因此,当某些收集点103出现故障时,与这些收集点103绑定的代理服务节点102中存储的需要加载的数据将无法加载到数据仓库104中,造成数据丢失。Due to the binding relationship between the
发明内容Contents of the invention
有鉴于此,本申请提供了一种数据加载方法和系统,能够解决由于收集节点出现故障而导致部分数据无法加载到数据仓库的问题。In view of this, the present application provides a data loading method and system, which can solve the problem that some data cannot be loaded into the data warehouse due to failure of the collection node.
一种数据加载方法,该方法包括:A data loading method, the method comprising:
配置数据块标识ID与收集点的映射关系,在收集点出现故障时,将该出现故障的收集点映射的数据块ID重新配置为与其他未出现故障的收集点相映射;Configure the mapping relationship between the data block identification ID and the collection point. When the collection point fails, reconfigure the data block ID mapped by the faulty collection point to map with other non-failure collection points;
将需要加载的数据按照预设规则划分为数据块,并为每个数据块赋予ID,获取数据块ID与收集点的映射关系,根据该映射关系,将数据块发给该数据块ID映射的收集点;Divide the data to be loaded into data blocks according to the preset rules, and assign an ID to each data block, obtain the mapping relationship between the data block ID and the collection point, and send the data block to the data block ID mapped according to the mapping relationship collection point;
收集点将数据块写入到数据仓库。Collection points write data blocks to the data warehouse.
一种数据加载系统,该系统包括主节点、代理服务节点和收集点;A data loading system, the system includes a master node, a proxy service node and a collection point;
所述主节点,用于配置数据块标识ID与收集点的映射关系,在收集点出现故障时,将该出现故障的收集点映射的数据块ID重新配置为与其他未出现故障的收集点相映射;The master node is used to configure the mapping relationship between the data block identification ID and the collection point, and when the collection point fails, reconfigure the data block ID mapped by the faulty collection point to be the same as that of other non-failure collection points. mapping;
所述代理服务节点,用于将需要加载的数据按照预设规则划分为数据块,并为每个数据块赋予ID,获取数据块ID与收集点的映射关系,根据该映射关系,将数据块发给该数据块ID映射的收集点;The proxy service node is used to divide the data that needs to be loaded into data blocks according to preset rules, and assign an ID to each data block, obtain the mapping relationship between the data block ID and the collection point, and according to the mapping relationship, the data block Send to the collection point of the data block ID mapping;
所述收集点,用于将数据块写入到数据仓库。The collection point is used to write data blocks to the data warehouse.
可见,由于本发明在进行数据加载时,并非像现有技术那样将每个代理服务节点与收集点进行绑定,每个代理服务节点负责的需要加载的数据只能通过绑定的特定收集点加载到数据仓库中,而是将需要加载的数据划分为粒度较小的数据块,为每个数据块赋予ID,建立数据块ID与收集点的映射关系,根据映射关系确定每个收集点负责加载的数据块,并且,一旦发现有某个或某些收集点出现故障,则重新配置数据块ID与收集点的映射关系,即将出现故障的收集点映射的数据块ID重新配置为与其他未出现故障的收集点相映射,进而使得能够通过其他未出现故障的收集点将已出现故障的收集点原来负责的数据块写入到数据仓库中,解决了现有技术中由于收集点出现故障而导致部分数据无法加载到数据仓库的问题。It can be seen that since the present invention does not bind each proxy service node to a collection point as in the prior art when data loading is performed, the data that each proxy service node is responsible for and needs to be loaded can only pass through the bound specific collection point Instead, divide the data to be loaded into data blocks with smaller granularity, assign an ID to each data block, establish a mapping relationship between the data block ID and the collection point, and determine the responsibility of each collection point according to the mapping relationship. The loaded data block, and once one or some collection points are found to be faulty, reconfigure the mapping relationship between the data block ID and the collection point, that is, reconfigure the data block ID mapped to the faulty collection point to be the same as other unidentified The faulty collection point is mapped to each other, so that the data block originally responsible for the faulty collection point can be written into the data warehouse through other non-failure collection points, which solves the problem of failure of the collection point in the prior art. A problem that caused some data to fail to load into the data warehouse.
附图说明Description of drawings
图1是目前常用的数据加载系统组成示意图。Figure 1 is a schematic diagram of the composition of a commonly used data loading system at present.
图2是本发明提供的数据加载方法流程图。Fig. 2 is a flow chart of the data loading method provided by the present invention.
图3是本发明提供的数据加载系统组成示意图。Fig. 3 is a schematic diagram of the composition of the data loading system provided by the present invention.
具体实施方式Detailed ways
图2是本发明提供的数据加载方法流程图。Fig. 2 is a flow chart of the data loading method provided by the present invention.
如图2所示,该流程包括:As shown in Figure 2, the process includes:
步骤201,配置数据块标识(ID)与收集点的映射关系。
本步骤中,一般由主节点配置数据块ID与收集点的映射关系。In this step, generally, the master node configures the mapping relationship between the data block ID and the collection point.
其中,可以预设数据块ID的生成方法,因此能够明确数据块ID的范围,进而确定每个收集点对应哪些数据块ID。例如,可以采用对数据块取HASH值的方式,将数据块的HASH值确定为该数据块的ID,还可以将随机数生成器生成的随机数确定为数据块的ID,只要保证每个数据块的ID都是唯一的即可。Wherein, the generation method of the data block ID can be preset, so the range of the data block ID can be clarified, and then which data block IDs correspond to each collection point can be determined. For example, the HASH value of the data block can be determined as the ID of the data block by taking the HASH value of the data block, and the random number generated by the random number generator can also be determined as the ID of the data block, as long as each data All block IDs are unique.
为了提高运行效率,可以将所述映射关系存储在主节点的内存中,为了增加映射的数据块的个数,当采用映射关系表存储所述映射关系时,可以存储数据块ID区间与收集点之间的映射关系。In order to improve operating efficiency, the mapping relationship can be stored in the memory of the master node. In order to increase the number of mapped data blocks, when the mapping relationship table is used to store the mapping relationship, the data block ID interval and collection point can be stored. mapping relationship between them.
步骤202,在收集点出现故障时,将该出现故障的收集点映射的数据块ID重新配置为与其他未出现故障的收集点相映射。
其中,可以采用多种方法确定收集点是否出现故障,例如,当由主节点配置数据块ID与收集点的映射关系时,收集点可以定期向主节点上报自身的状态,如果主节点在指定的时间内没有收到收集点上报的状态信息,则可以确定出收集点出现故障。Among them, a variety of methods can be used to determine whether the collection point is faulty. For example, when the mapping relationship between the data block ID and the collection point is configured by the master node, the collection point can periodically report its own status to the master node. If the status information reported by the collection point is not received within a certain period of time, it can be determined that the collection point is faulty.
步骤203,将需要加载的数据按照预设规则划分为数据块,并为每个数据块赋予ID。
其中,数据块的大小可以根据实际需要确定,例如,对于一个文件,可以将每1000条数据划分为一个数据块,如果该文件的剩余数据不足1000条数据,则将该剩余数据确定为该文件的最后一个数据块。Wherein, the size of the data block can be determined according to actual needs, for example, for a file, every 1000 pieces of data can be divided into a data block, if the remaining data of the file is less than 1000 pieces of data, then the remaining data is determined as the file the last block of data.
步骤204,获取数据块ID与收集点的映射关系,根据该映射关系,将数据块发给该数据块ID映射的收集点。
本步骤中,在向收集点发送数据块出错时,一般可以判断出此时该收集点出现了故障,数据块ID与收集点的映射关系可能已经被重新配置,因此可以主动地重新获取数据块ID与收集点的映射关系,根据重新获取的映射关系确定该数据块映射的收集点,向该收集点发送该数据块。当然,也可以由主节点在每次重配置所述映射关系后,下发最新的所述映射关系。In this step, when an error occurs when sending a data block to the collection point, it can generally be judged that the collection point has failed at this time, and the mapping relationship between the data block ID and the collection point may have been reconfigured, so the data block can be actively reacquired According to the mapping relationship between the ID and the collection point, the collection point mapped to the data block is determined according to the reacquired mapping relationship, and the data block is sent to the collection point. Of course, the master node may also deliver the latest mapping relationship after each reconfiguration of the mapping relationship.
上述步骤201至步骤204,只要不出现逻辑矛盾,则相互之间的执行顺序可调,或者可以并发执行,例如,可以同时执行步骤201和步骤203。The
步骤205,收集点将数据块写入到数据仓库。
本发明中,将出现故障的收集点映射的数据块ID重新配置为与其他未出现故障的收集点相映射具体可以包括:In the present invention, reconfiguring the data block ID mapped by the faulty collection point to be mapped with other non-failure collection points may specifically include:
根据所述其他未出现故障的收集点的负荷状态,选取负荷满足预定条件的收集点,例如选取负荷小于预定值的收集点、或选取负荷最小的收集点,将该出现故障的收集点映射的数据块ID重新配置为与所述负荷满足预定条件的收集点相映射,或者,将该出现故障的收集点映射的数据块ID均匀地、或按照一定比例地分成两份以上,将各份分别映射不同的未出现故障的收集点。According to the load state of the other collection points that have not failed, select a collection point whose load meets a predetermined condition, for example, select a collection point with a load less than a predetermined value, or select a collection point with the smallest load, and map the faulty collection point The data block ID is reconfigured to be mapped to the collection point whose load meets the predetermined condition, or the data block ID mapped to the faulty collection point is evenly or proportionally divided into two or more parts, and each part is divided into Map a different surviving collection point.
例如,当前一共有5个收集点,收集点1映射的ID值为1~100,收集点2映射的ID值为101~200,收集点3映射的ID值为201~300,收集点4映射的ID值为301~400,收集点5映射的ID值为401~500,当收集点2出现故障时,如果发现收集点5为最不繁忙,则收集点5将映射ID值为101~200以及401到500的数据块ID,如果各个收集点的负载较均衡,则可以将ID值101~200分成101~150和151~200两个区间,选取两个其他收集点,例如收集点1和3,分别映射ID值为101~150的数据块ID和ID值为151~200的数据块ID。For example, there are currently 5 collection points. The ID value mapped to collection point 1 is 1~100, the ID value mapped to collection point 2 is 101~200, the ID value mapped to collection point 3 is 201~300, and the ID value mapped to collection point 4 is The ID value of the collection point 5 is 301~400, and the ID value mapped to the collection point 5 is 401~500. When the collection point 2 fails, if the collection point 5 is found to be the least busy, the collection point 5 will map the ID value to 101~200 And the data block IDs from 401 to 500. If the load of each collection point is relatively balanced, you can divide the
为了避免数据的重复加载,本发明还提出,还可以维护收集点接收到的每个数据块的加载状态信息,则收集点在将数据块写入到数据仓库时,根据数据块的加载状态信息判断接收的数据块是否已被加载到数据仓库中,如果是,则不将该接收的数据块写入到数据仓库,否则,将该接收的数据块写入到数据仓库。In order to avoid repeated loading of data, the present invention also proposes that the loading state information of each data block received by the collection point can also be maintained, and then the collection point, when writing the data block into the data warehouse, It is judged whether the received data block has been loaded into the data warehouse, if so, the received data block is not written into the data warehouse, otherwise, the received data block is written into the data warehouse.
为了进一步提高避免数据重复加载的效果,不仅避免已经完全加载到数据仓库中的数据块的重复加载,还尽量避免已经部分加载到数据仓库中的数据块的重复加载,本发明还提出:In order to further improve the effect of avoiding repeated loading of data, not only to avoid repeated loading of data blocks that have been fully loaded into the data warehouse, but also to avoid repeated loading of data blocks that have been partially loaded into the data warehouse, the present invention also proposes:
所述述加载状态信息包括已经加载状态、未加载状态和正在加载状态,收集点根据接收的数据块ID查询该数据块的加载状态,在该数据块的加载状态为已经加载状态时,丢弃该数据块,在该数据块的加载状态为未加载状态时,将该数据块写入到数据仓库,在该数据块的加载状态为正在加载状态时,根据该数据块ID查询数据仓库中该数据块已经加载的内容,删除该数据块已经加载的内容,然后再将该数据块写入到数据仓库。The loading state information includes a loaded state, an unloaded state and a loading state, and the collection point queries the loading state of the data block according to the received data block ID, and when the loading state of the data block is a loaded state, the data block is discarded. Data block, when the loading state of the data block is not loaded, write the data block to the data warehouse, and when the loading state of the data block is loading, query the data in the data warehouse according to the data block ID The loaded content of the block, delete the loaded content of the data block, and then write the data block to the data warehouse.
其中,为了使得收集点能够根据数据块ID快速查询到数据仓库中是否已存储了该数据块ID对应的数据块的内容,收集点向数据仓库发送的内容不仅包括数据块的内容,还包括数据块的ID,数据仓库不仅存储数据块的内容,还存储数据块的内容与数据块ID的对应关系,数据仓库中还可以设置专门的查询模块,用于根据数据块ID返回数据仓库中针对该数据块ID的存储信息。Among them, in order to enable the collection point to quickly query whether the content of the data block corresponding to the data block ID has been stored in the data warehouse according to the data block ID, the content sent by the collection point to the data warehouse includes not only the content of the data block, but also the data The ID of the block, the data warehouse not only stores the content of the data block, but also stores the corresponding relationship between the content of the data block and the ID of the data block. A special query module can also be set in the data warehouse to return the data in the data warehouse according to the ID of the data block. The storage information of the data block ID.
为了保证所维护的数据块的加载状态信息的准确性,本发明提出,维护数据块的加载状态信息具体可以包括:In order to ensure the accuracy of the loading status information of the maintained data blocks, the present invention proposes that maintaining the loading status information of the data blocks may specifically include:
维护数据块加载状态表,其中存储有数据块ID与相应数据块的加载状态信息的对应关系,在初始化所述数据块加载状态表时,将所有数据块ID对应的加载状态信息初始化为未加载状态,在接收到数据块并且确定该数据块ID对应的数据加载状态为未加载状态后,将该数据块ID对应的加载状态信息修改为正在加载状态,在该数据块全部加载成功后,将该数据块ID对应的加载状态信息修改为已经加载状态,在接收到数据块并且确定该数据块ID对应的数据加载状态为正在加载状态、且已经将数据仓库中该数据块已经加载的内容删除后,将该数据块ID对应的数据加载状态修改为未加载状态。Maintaining the data block loading state table, which stores the corresponding relationship between the data block ID and the loading state information of the corresponding data block, when initializing the data block loading state table, initializing the loading state information corresponding to all data block IDs as unloaded status, after receiving the data block and determining that the data loading status corresponding to the data block ID is not loaded, modify the loading status information corresponding to the data block ID to the loading status, and after all the data blocks are loaded successfully, set The loading state information corresponding to the data block ID is modified to the loaded state, after receiving the data block and determining that the data loading state corresponding to the data block ID is the loading state, and the loaded content of the data block in the data warehouse has been deleted After that, modify the data loading status corresponding to the data block ID to the unloading status.
本发明还提供了一种数据加载系统,具体请参见图3。The present invention also provides a data loading system, please refer to FIG. 3 for details.
图3是本发明提供的数据加载系统组成示意图。Fig. 3 is a schematic diagram of the composition of the data loading system provided by the present invention.
如图3所示,该系统包括主节点301、代理服务节点302和收集点303,代理服务节点302根据主节点301配置的数据块ID与收集点的映射关系,将数据发给该数据所在数据块的ID映射的收集点,例如在图3所示的例子中,代理服务节点A上的数据所在数据块的ID映射收集点A,代理服务节点B、C和D上的数据所在数据块的ID映射收集点B。As shown in Figure 3, the system includes a
具体地,图3所示系统中各个组成部分的功能如下:Specifically, the functions of each component in the system shown in Figure 3 are as follows:
主节点301,用于配置数据块标识ID与收集点的映射关系,在收集点303出现故障时,将该出现故障的收集点303映射的数据块ID重新配置为与其他未出现故障的收集点303相映射。The
为了节省主节点301的存储空间,所述映射关系可以包括数据块ID区间与收集点303之间的映射关系。In order to save storage space of the
代理服务节点302,用于将需要加载的数据按照预设规则划分为数据块,并为每个数据块赋予ID,获取数据块ID与收集点的映射关系,根据该映射关系,将数据块发给该数据块ID映射的收集点。The
收集点303,用于将数据块写入到数据仓库。The
其中,代理服务节点302的个数一般为两个以上,收集点303的个数一般也为两个以上,一般将需要加载的数据存储在指定的代理服务节点302的指定存储空间内,由代理服务节点302将自身存储的需要加载的数据进行分块。Wherein, the number of
其中,代理服务节点302中一般包括有代理服务模块,由代理服务模块对需要加载的数据进行分块、并赋予数据块唯一ID,获取数据块ID与收集点的映射关系,根据所述映射关系将数据块发给相应的收集点303。Wherein, the
其中,收集点303可以定期向主节点301报告自身的状态信息,主节点301在规定的时间内没有收到收集点303报告的状态信息时,主节点301确定该没有报告状态信息的收集点303出现故障。Among them, the
主节点301也可以采用其他方法确定收集点303是否出现故障,例如,可以由代理服务节点302在向某个收集点303发送数据出错时,向主节点301报告该收集点303出现故障,从而触发主节点301重新配置数据块ID与收集点303的映射关系。The
主节点301在确定出某个收集点303出现故障时,可以根据其他未出现故障的收集点303的负荷状态,选取负荷满足预定条件的收集点303,将该出现故障的收集点303映射的数据块ID重新配置为与所述负荷满足预定条件的收集点303相映射,或者,将该出现故障的收集点303映射的数据块ID均匀地、或按照一定比例地分成两份以上,将各份分别映射不同的未出现故障的收集点303。When the
其中的代理服务节点302,可以在向收集点303发送数据块出错时,从主节点301重新获取数据块ID与收集点的映射关系,根据重新获取的映射关系确定该数据块映射的收集点303,向该收集点303发送该数据块。Wherein, the
当然,代理服务节点302也可以直接接收主节点301下发的、更新后的映射关系表,根据更新后的映射关系表确定每个数据块映射的收集点303。Of course, the
为了避免数据的重复加载,收集点303可以包括状态维护模块和写入模块。In order to avoid repeated loading of data, the
所述状态维护模块,用于维护收集点接收到的每个数据块的加载状态信息。The state maintenance module is used to maintain the loading state information of each data block received by the collection point.
所述写入模块,用于根据数据块的加载状态信息判断接收的数据块是否已被加载到数据仓库中,如果是,则不将该接收的数据块写入到数据仓库,否则,将该接收的数据块写入到数据仓库。The writing module is used to judge whether the received data block has been loaded into the data warehouse according to the loading state information of the data block, and if so, do not write the received data block into the data warehouse, otherwise, write the received data block to the data warehouse. The received data blocks are written to the data warehouse.
所述加载状态信息具体可以包括已经加载状态、未加载状态和正在加载状态。该系统还可以包括数据仓库。The loading state information may specifically include a loaded state, an unloaded state, and a loading state. The system can also include a data warehouse.
所述写入模块,用于向数据仓库发送数据块的内容和该数据块的ID。The writing module is used to send the content of the data block and the ID of the data block to the data warehouse.
所述数据仓库,用于存储收集点发送的数据块内容,并且,存储数据块内容与数据块ID的对应关系。The data warehouse is used to store the content of the data block sent by the collection point, and store the corresponding relationship between the content of the data block and the ID of the data block.
所述写入模块,用于根据接收的数据块ID查询该数据块的加载状态,在该数据块的加载状态为已经加载状态时,丢弃该数据块,在该数据块的加载状态为未加载状态时,将该数据块写入到数据仓库,在该数据块的加载状态为正在加载状态时,根据该数据块ID查询数据仓库中该数据块已经加载的内容,删除该数据块已经加载的内容,然后再将该数据块写入到数据仓库。The writing module is used to query the loading state of the data block according to the received data block ID, and discard the data block when the loading state of the data block is a loaded state, and the loading state of the data block is unloaded state, write the data block to the data warehouse, and when the loading state of the data block is loading, query the loaded content of the data block in the data warehouse according to the data block ID, and delete the loaded content of the data block content before writing the data block to the data warehouse.
所述状态维护模块,具体可以用于维护数据块加载状态表,其中存储有数据块ID与相应数据块的加载状态信息的对应关系,在初始化所述数据块加载状态表时,将所有数据块ID对应的加载状态信息初始化为未加载状态,在接收到数据块并且确定该数据块ID对应的数据加载状态为未加载状态后,将该数据块ID对应的加载状态信息修改为正在加载状态,在该数据块全部加载成功后,将该数据块ID对应的加载状态信息修改为已经加载状态,在接收到数据块并且确定该数据块ID对应的数据加载状态为正在加载状态、且已经将数据仓库中该数据块已经加载的内容删除后,将该数据块ID对应的数据加载状态修改为未加载状态。The state maintenance module can specifically be used to maintain the data block loading state table, wherein the corresponding relationship between the data block ID and the loading state information of the corresponding data block is stored, and when the data block loading state table is initialized, all data blocks The loading state information corresponding to the ID is initialized to the unloaded state, and after receiving the data block and determining that the data loading state corresponding to the data block ID is the unloaded state, modify the loading state information corresponding to the data block ID to the loading state, After all the data blocks are successfully loaded, modify the loading state information corresponding to the data block ID to the loaded state. After receiving the data block and determining that the data loading state corresponding to the data block ID is the loading state, and the data After the loaded content of the data block in the warehouse is deleted, the data loading state corresponding to the data block ID is changed to the unloaded state.
可见,本发明中,一旦发现有某个或某些收集点出现故障,则重新配置数据块ID与收集点的映射关系,即将出现故障的收集点映射的数据块ID重新配置为与其他未出现故障的收集点相映射,进而使得能够通过其他未出现故障的收集点将已出现故障的收集点原来负责的数据块写入到数据仓库中,实现了不间断、无丢失地地进行数据加载,保证了数据加载的一致性和及时性。Visible, in the present invention, in case some or some collecting points break down, then reconfigure the mapping relation of data block ID and collecting point, be about to reconfigure the data block ID of the collecting point mapping that fails to be with other non-appearing The faulty collection points are mapped to each other, so that the data blocks originally responsible for the faulty collection points can be written into the data warehouse through other non-faulty collection points, realizing uninterrupted and loss-free data loading. The consistency and timeliness of data loading are guaranteed.
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明保护的范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included in the present invention. within the scope of protection.
Claims (14)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201210580016.5A CN103902585A (en) | 2012-12-27 | 2012-12-27 | Data loading method and system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201210580016.5A CN103902585A (en) | 2012-12-27 | 2012-12-27 | Data loading method and system |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN103902585A true CN103902585A (en) | 2014-07-02 |
Family
ID=50993913
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201210580016.5A Pending CN103902585A (en) | 2012-12-27 | 2012-12-27 | Data loading method and system |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN103902585A (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110083651A (en) * | 2015-11-20 | 2019-08-02 | 杭州数梦工场科技有限公司 | A kind of method and apparatus of data load |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030065549A1 (en) * | 2001-03-23 | 2003-04-03 | Restaurant Services, Inc. | System, method and computer program product for a promotion reporting interface in a supply chain management framework |
| CN101465877A (en) * | 2007-12-17 | 2009-06-24 | 诺基亚西门子通信公司 | Load distribution in distributed database system |
| US20090254572A1 (en) * | 2007-01-05 | 2009-10-08 | Redlich Ron M | Digital information infrastructure and method |
| CN101595666A (en) * | 2006-06-30 | 2009-12-02 | 艾姆巴克控股有限公司 | System and method for managing user usage of a communication network |
| CN101755427A (en) * | 2007-04-10 | 2010-06-23 | 阿珀蒂奥有限公司 | Improved sub-tree access control in network architectures |
| CN102460389A (en) * | 2009-05-02 | 2012-05-16 | 思杰系统有限公司 | System and method for launching applications into an existing isolation environment |
| CN102497353A (en) * | 2011-10-28 | 2012-06-13 | 深圳第七大道科技有限公司 | Processing method, server and system for multi-server distributed data |
-
2012
- 2012-12-27 CN CN201210580016.5A patent/CN103902585A/en active Pending
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030065549A1 (en) * | 2001-03-23 | 2003-04-03 | Restaurant Services, Inc. | System, method and computer program product for a promotion reporting interface in a supply chain management framework |
| CN101595666A (en) * | 2006-06-30 | 2009-12-02 | 艾姆巴克控股有限公司 | System and method for managing user usage of a communication network |
| US20090254572A1 (en) * | 2007-01-05 | 2009-10-08 | Redlich Ron M | Digital information infrastructure and method |
| CN101755427A (en) * | 2007-04-10 | 2010-06-23 | 阿珀蒂奥有限公司 | Improved sub-tree access control in network architectures |
| CN101465877A (en) * | 2007-12-17 | 2009-06-24 | 诺基亚西门子通信公司 | Load distribution in distributed database system |
| CN102460389A (en) * | 2009-05-02 | 2012-05-16 | 思杰系统有限公司 | System and method for launching applications into an existing isolation environment |
| CN102497353A (en) * | 2011-10-28 | 2012-06-13 | 深圳第七大道科技有限公司 | Processing method, server and system for multi-server distributed data |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110083651A (en) * | 2015-11-20 | 2019-08-02 | 杭州数梦工场科技有限公司 | A kind of method and apparatus of data load |
| CN110083651B (en) * | 2015-11-20 | 2021-06-29 | 杭州数梦工场科技有限公司 | Data loading method and device |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11714715B2 (en) | Storage system accommodating varying storage capacities | |
| US12212624B2 (en) | Independent communication pathways | |
| US10379763B2 (en) | Hyperconverged storage system with distributable processing power | |
| US10324812B2 (en) | Error recovery in a storage cluster | |
| EP3155527B1 (en) | Redundant, fault-tolerant, distributed remote procedure call cache in a storage system | |
| AU2016218381B2 (en) | Storage system architecture | |
| US11671496B2 (en) | Load balacing for distibuted computing | |
| CN104967536A (en) | Method and device for realizing data consistency in multiple computer rooms | |
| CN106873918A (en) | Storage method to set up and device in a kind of virtualization system | |
| CN103902585A (en) | Data loading method and system | |
| US11294893B2 (en) | Aggregation of queries | |
| CN105488047A (en) | Metadata read-write method and device | |
| CN113596195A (en) | Public IP address management method, device, main node and storage medium | |
| CN104636436B (en) | A kind of configurable distributed cache system |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20140702 |