+

CN116644202A - Method and device for storing large-data-volume remote sensing image data - Google Patents

Method and device for storing large-data-volume remote sensing image data Download PDF

Info

Publication number
CN116644202A
CN116644202A CN202310667083.9A CN202310667083A CN116644202A CN 116644202 A CN116644202 A CN 116644202A CN 202310667083 A CN202310667083 A CN 202310667083A CN 116644202 A CN116644202 A CN 116644202A
Authority
CN
China
Prior art keywords
data
nodes
remote sensing
sensing image
image data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310667083.9A
Other languages
Chinese (zh)
Inventor
周宁
杨毅
钟普天
刘宁山
徐斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yizhirui Information Technology Co ltd
Original Assignee
Beijing Jietai Yunji Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jietai Yunji Information Technology Co ltd filed Critical Beijing Jietai Yunji Information Technology Co ltd
Priority to CN202310667083.9A priority Critical patent/CN116644202A/en
Publication of CN116644202A publication Critical patent/CN116644202A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请涉及大数据领域,尤其涉及一种大数据量遥感影像数据的存储方法和装置,其包括大数据量遥感影像数据的存储方法,包括:将数据划分为N个片段;将存储空间划分为M个节点;将所述M个节点中的一个节点设置为配置节点,剩余的M‑1个节点设置为存储数据的工作节点;利用配置节点将N个片段的数据存储到对应的存储数据的M‑1个工作节点。还包括:增加K个工作节点;配置节点分配增加工作节点后的M+K‑1个节点的数据存储。本申请具有存储过程中效率提升,保证系统的稳定性和可靠性的效果。

The present application relates to the field of big data, and in particular to a method and device for storing remote sensing image data with a large amount of data, which includes a method for storing remote sensing image data with a large amount of data, including: dividing the data into N segments; dividing the storage space into M nodes; set one of the M nodes as a configuration node, and the remaining M-1 nodes are set as working nodes for storing data; use the configuration node to store the data of N fragments to the corresponding storage data M‑1 worker nodes. It also includes: adding K working nodes; configuring nodes to allocate data storage of M+K‑1 nodes after adding working nodes. This application has the effect of improving efficiency in the storage process and ensuring the stability and reliability of the system.

Description

大数据量遥感影像数据的存储方法和装置Storage method and device for large amount of remote sensing image data

技术领域technical field

本申请涉及大数据的领域,尤其是涉及一种大数据量遥感影像数据的存储方法和装置。The present application relates to the field of big data, in particular to a method and device for storing remote sensing image data with a large amount of data.

背景技术Background technique

目前在遥感空间数据存储领域,当库中数据量达到上千万甚至上亿时,数据检索(属性检索和空间检索、空间聚合等)会耗时较久,导致系统响应变慢、接口超时。传统的分库分表是一种在关系型数据库中对数据进行水平分割的技术。在这种情况下,主要目标是将大量空间数据划分为较小的数据块,然后将其分配到不同的数据库中,以提高数据库的性能和可扩展性。一般情况下,当数据量达到一个可承受的范围时,就需要进行分库分表。这种技术是基于数据分散的概念,将一个大型的数据库拆分为多个更小的数据库,然后将空间数据分散到这些数据库中。分库分表会增加系统中的组件数量,导致了更多的管理和配置工作,这会增加整个系统开发、部署和维护的难度。目标数据可能分散在各个库中,空间操作如聚合等需要进行多次运算才能得到结果,存储效率提升并不明显。At present, in the field of remote sensing spatial data storage, when the amount of data in the database reaches tens of millions or even hundreds of millions, data retrieval (attribute retrieval, spatial retrieval, spatial aggregation, etc.) will take a long time, resulting in slow system response and interface timeout. The traditional sub-database sub-table is a technology for horizontally partitioning data in a relational database. In this case, the main goal is to divide the large amount of spatial data into smaller data blocks, which are then allocated to different databases to improve the performance and scalability of the database. Under normal circumstances, when the amount of data reaches an acceptable range, it is necessary to divide the database and table. This technology is based on the concept of data dispersion, splitting a large database into multiple smaller databases, and then dispersing spatial data into these databases. Sub-database and sub-table will increase the number of components in the system, resulting in more management and configuration work, which will increase the difficulty of development, deployment and maintenance of the entire system. The target data may be scattered in various libraries. Spatial operations such as aggregation require multiple operations to obtain results, and the storage efficiency is not significantly improved.

发明内容Contents of the invention

为了解决存储过程中效率提升不明显的技术问题,本申请提供了一种大数据量遥感影像数据的存储方法和装置。In order to solve the technical problem that the efficiency improvement in the storage process is not obvious, the present application provides a method and device for storing remote sensing image data with a large amount of data.

本申请提供的一种大数据量遥感影像数据的存储方法,采用如下的技术方案:A method for storing remote sensing image data with a large amount of data provided by this application adopts the following technical scheme:

第一方面,提供一种大数据量遥感影像数据的存储方法,包括:In the first aspect, a method for storing remote sensing image data with a large amount of data is provided, including:

将所述大数据量遥感影像数据划分为N个片段;Dividing the large amount of remote sensing image data into N segments;

将存储空间划分为M个节点;Divide the storage space into M nodes;

将所述M个节点中的一个节点设置为配置节点,剩余的M-1个节点设置为存储大数据量遥感影像数据的工作节点;Setting one of the M nodes as a configuration node, and setting the remaining M-1 nodes as working nodes for storing large amounts of remote sensing image data;

利用配置节点将N个片段的所述数据存储到对应的存储大数据量遥感影像数据的M-1个工作节点。The configuration nodes are used to store the data of the N fragments to the corresponding M-1 working nodes that store large amount of remote sensing image data.

优选的,还包括:Preferably, it also includes:

增加K个工作节点;Add K working nodes;

配置节点分配增加工作节点后的M+K-1个节点的数据存储。Configure nodes to allocate data storage for M+K-1 nodes after adding working nodes.

优选的,还包括:Preferably, it also includes:

将所述N个片段复制P份,分配到M-1个工作节点中。Copy P copies of the N fragments and distribute them to M-1 working nodes.

优选的,所述将大数据量遥感影像数据划分为N个片段,包括:Preferably, said dividing the remote sensing image data with a large amount of data into N segments includes:

创建分布式索引;Create a distributed index;

根据分布式索引的结构,将表分为多个片段;According to the structure of the distributed index, the table is divided into multiple fragments;

将数据划分为N个片段,并加入到对应的分为多个片段的表中。Divide the data into N fragments and add them to the corresponding table divided into multiple fragments.

优选的,所述分布式索引,包括:分布式B-Tree和/或哈希索引。Preferably, the distributed index includes: distributed B-Tree and/or hash index.

第二方面,还提供一种大数据量遥感影像数据存储装置,包括:In the second aspect, a remote sensing image data storage device with a large amount of data is also provided, including:

第一划分模块:用于将所述大数据量遥感影像数据划分为N个片段;The first division module: for dividing the remote sensing image data with a large amount of data into N segments;

第二划分模块:用于将存储空间划分为M个节点;The second division module: for dividing the storage space into M nodes;

设置模块:用于将所述M个节点中的一个节点设置为配置节点,剩余的M-1个节点设置为存储大数据量遥感影像数据的工作节点;Setting module: used to set one of the M nodes as a configuration node, and set the remaining M-1 nodes as working nodes for storing large amounts of remote sensing image data;

第一存储模块:用于利用配置节点将N个片段的所述大数据量遥感影像数据存储到对应的存储大数据量遥感影像数据的M-1个工作节点。The first storage module: used to store the remote sensing image data of N fragments with a large amount of data into the corresponding M-1 working nodes storing the remote sensing image data with a large amount of data by using the configuration node.

优选的,还包括:Preferably, it also includes:

增加模块:用于增加K个工作节点;Add module: used to add K working nodes;

第二存储模块:用于配置节点分配增加工作节点后的M+K-1个节点的数据存储。The second storage module: it is used to configure the node to allocate data storage of M+K-1 nodes after adding working nodes.

优选的,还包括:Preferably, it also includes:

复制模块:用于将所述N个片段复制P份,分配到M-1个工作节点中。Copying module: for copying P copies of the N fragments and distributing them to M-1 working nodes.

优选的,所述第一划分模块,包括:创建模块:用于创建分布式索引;Preferably, the first division module includes: a creation module: used to create a distributed index;

分配模块:用于根据分布式索引的结构,将表分为多个片段;Allocation module: used to divide the table into multiple fragments according to the structure of the distributed index;

第三存储模块:用于将所述大数据量遥感影像数据划分为N个片段,并加入到对应的分为多个片段的表中。The third storage module: used to divide the remote sensing image data with a large amount of data into N pieces, and add them to the corresponding table divided into multiple pieces.

优选的,所述分布式索引,包括:分布式B-Tree和/或哈希索引。Preferably, the distributed index includes: distributed B-Tree and/or hash index.

综上所述,本申请包括以下至少一种有益技术效果:In summary, the present application includes at least one of the following beneficial technical effects:

1.可以支持大规模数据的高效存储。在空间遥感元数据的存储方面,可以将数据水平分片,并将不同的数据分散存储在不同的节点上,从而避免单点故障和数据倾斜问题,提高整个系统的可靠性和可扩展性。1. It can support efficient storage of large-scale data. In terms of storage of spatial remote sensing metadata, data can be horizontally sharded and different data can be distributed and stored on different nodes, so as to avoid single point of failure and data skew, and improve the reliability and scalability of the entire system.

2.可以采用空间哈希的方式将数据分散到不同的节点上。在空间遥感元数据的存储方面,可以利用空间哈希算法将数据按照空间坐标进行哈希,从而将相邻的数据划分到同一个节点上。这样可以提高查询效率,同时减少网络开销,提高整个系统的性能。2. The data can be distributed to different nodes by means of spatial hashing. In the storage of spatial remote sensing metadata, the spatial hash algorithm can be used to hash the data according to the spatial coordinates, so that adjacent data can be divided into the same node. This can improve query efficiency while reducing network overhead and improving overall system performance.

3.可以根据数据量的增长实时调整节点数目和数据分片,从而保证系统的稳定性和可靠性。3. The number of nodes and data fragmentation can be adjusted in real time according to the growth of data volume, so as to ensure the stability and reliability of the system.

附图说明Description of drawings

图1是一种大数据量遥感影像数据存储方法的第一实施例图;Fig. 1 is a first embodiment diagram of a method for storing remote sensing image data with a large amount of data;

图2是一种大数据量遥感影像数据存储方法的第二实施例图;Fig. 2 is a second embodiment diagram of a method for storing remote sensing image data with a large amount of data;

图3是一种大数据量遥感影像数据存储方法的第三实施例图;Fig. 3 is a diagram of a third embodiment of a method for storing remote sensing image data with a large amount of data;

图4是将数据划分为N个片段的实施例图;Fig. 4 is the embodiment figure that divides data into N segments;

图5是一种大数据量遥感影像数据存储装置的第一实施例图;Fig. 5 is a diagram of the first embodiment of a remote sensing image data storage device with a large amount of data;

图6是一种大数据量遥感影像数据存储装置的第二实施例图;Fig. 6 is a diagram of a second embodiment of a remote sensing image data storage device with a large amount of data;

图7是一种大数据量遥感影像数据存储装置的第三实施例图;Fig. 7 is a diagram of a third embodiment of a remote sensing image data storage device with a large amount of data;

图8是第一划分模块的实施例图。Fig. 8 is an embodiment diagram of the first dividing module.

附图标记说明:1、一种大数据量遥感影像数据存储装置;11、第一划分模块;12、第二划分模块;13、设置模块;14、第一存储模块;15、增加模块;16、第二存储模块;17、复制模块;111、创建模块;112、分配模块;113、第三存储模块。Explanation of reference numerals: 1. A remote sensing image data storage device with a large amount of data; 11. The first division module; 12. The second division module; 13. The setting module; 14. The first storage module; 15. The increase module; 16 . The second storage module; 17. The copy module; 111. The creation module; 112. The allocation module; 113. The third storage module.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图1-8及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings 1-8 and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, not to limit the present application.

本申请提供的一种数据存储方法,采用如下的技术方案:A data storage method provided by this application adopts the following technical solution:

第一方面,参考图1所示,提供一种大数据量遥感影像数据存储方法,包括:In the first aspect, as shown in FIG. 1, a method for storing remote sensing image data with a large amount of data is provided, including:

S1:将大数据量遥感影像数据划分为N个片段;大数据量遥感影像数据的数据量会很大,所以,需要分段处理,如果不分段,则可能造成数据存储过程中跨节点或跨服务器的情况出现,此时,虽然存储上便利了,但是在访问的时候会出现访问超时等多种问题。S1: Divide the large amount of remote sensing image data into N segments; the data volume of the large amount of remote sensing image data will be large, so it needs to be processed in segments. If it is not segmented, it may cause cross-node or The cross-server situation occurs. At this time, although the storage is convenient, there will be various problems such as access timeout when accessing.

S2:将存储空间划分为M个节点;存储空间包括多种存储服务器或存储介质。多台存储器可以被划分为单个节点;单台服务器也可以被划分为多个节点。在本实施例中,将存储空间的多台服务器划分为M个节点;该单个节点,既可以包括单台服务器,也可以包括多台服务器。存储空间划分为M个节点的目的是为了适应数据被划分为N个片段,即使得划分为N个片段的数据能够适当的被分配到存储空间的M个节点。S2: Divide the storage space into M nodes; the storage space includes various storage servers or storage media. Multiple storage devices can be divided into a single node; a single server can also be divided into multiple nodes. In this embodiment, multiple servers in the storage space are divided into M nodes; the single node may include a single server or multiple servers. The purpose of dividing the storage space into M nodes is to adapt to the data being divided into N pieces, that is, to make the data divided into N pieces can be properly allocated to the M nodes of the storage space.

S3:将所述M个节点中的一个节点设置为配置节点,剩余的M-1个节点设置为存储数据的工作节点;所述配置节点只需要一个节点。但是,一个节点可以包括多台服务器,或者仅为一台服务器。配置节点主要作用是为了协调剩余的M-1个节点的数据分配、存储、检索和访问的工作。工作节点的作用,主要是为了存储数据。至于,如何存储数据,则是配置节点的主要工作,例如,配置节点根据数据的长短,将数据分段,然后按照存储数据的工作节点的现有存储空间进行数据片段与存储空间的匹配,按照合适的匹配规则,分配不同的数据段到不同的工作节点的服务器。S3: Set one of the M nodes as a configuration node, and set the remaining M−1 nodes as working nodes for storing data; the configuration node only needs one node. However, a node can include multiple servers, or just one server. The main role of the configuration node is to coordinate the work of data distribution, storage, retrieval and access of the remaining M-1 nodes. The role of the working node is mainly to store data. As for how to store data, it is the main job of the configuration node. For example, the configuration node divides the data into segments according to the length of the data, and then matches the data segment with the storage space according to the existing storage space of the working node that stores the data. Appropriate matching rules, assign different data segments to servers of different working nodes.

S4:利用配置节点将N个片段的大数据量遥感影像数据存储到对应的存储大数据量遥感影像数据的M-1个工作节点。此步骤的目的是为了更好的存储大数据量遥感影像数据。S4: Use the configuration node to store the remote sensing image data of N fragments with a large amount of data to the corresponding M-1 working nodes that store the remote sensing image data with a large amount of data. The purpose of this step is to better store large amounts of remote sensing image data.

优选的,参考图2所示,还包括:Preferably, as shown in Figure 2, it also includes:

S5:增加K个工作节点;在实际生产过程中,随时可能由于缺少服务器,而导致存储数据能力不足。那么,就需要增加K个工作节点,即增加K个不在原配置节点管理范畴的工作节点。那么,如何在增加的K个工作节点上,仍然能够合理的存储数据,就是本实施例亟待解决的技术问题。S5: Add K working nodes; in the actual production process, due to the lack of servers at any time, the ability to store data may be insufficient. Then, it is necessary to add K working nodes, that is, adding K working nodes that are not managed by the original configured nodes. Then, how to reasonably store data on the additional K working nodes is a technical problem to be solved urgently in this embodiment.

S6:配置节点分配增加工作节点后的M+K-1个节点的数据存储。增加了K个节点作为工作节点,那么需要在配置节点中配置这K个节点的位置。即要在配置节点中分配这K个节点的信息。具体内容如下:扫描表:会扫描所有需要重平衡的表,并记录每个表的分区数量。确定分区数量:对于每个表,会计算其所有分区的数量,并记录这些分区的数量。确定分配策略:会根据每个表的分区数量,确定分配策略。例如,如果某个表的所有分区数量都相同,则可以使用基于负载均衡的分配策略;如果某个表的某些分区数量比其他分区多很多,则可以使用基于重要性的分配策略。分配任务会将分配策略应用于每个表,并将查询任务分配到最合适的节点上。这个过程涉及到多个算法和技术,例如负载均衡算法、距离算法、以及重要性算法等。调整任务分布:在任务分配后,会检查任务分布的情况,并根据需要进行调整。例如,如果某个节点上的任务数量过多,则可能会将一些任务分配到其他节点上,以平衡任务分布。重复过程:会不断重复上述步骤,直到所有表的分区数量都得到了重平衡。S6: Configure nodes to allocate data storage for M+K-1 nodes after adding working nodes. If K nodes are added as working nodes, then the positions of these K nodes need to be configured in the configuration node. That is, the information of these K nodes should be allocated in the configuration node. The details are as follows: Scan table: All tables that need to be rebalanced will be scanned, and the number of partitions of each table will be recorded. Determine the number of partitions: For each table, the number of all its partitions is calculated and the number of these partitions is recorded. Determine the allocation strategy: The allocation strategy will be determined according to the number of partitions in each table. For example, if a table has the same number of partitions, you can use a load-balancing-based allocation strategy; if a certain table has many more partitions than others, you can use an importance-based allocation strategy. Allocation tasks will apply the allocation strategy to each table and assign query tasks to the most suitable nodes. This process involves multiple algorithms and technologies, such as load balancing algorithms, distance algorithms, and importance algorithms. Adjust task distribution: After tasks are assigned, the task distribution will be checked and adjusted as needed. For example, if the number of tasks on a certain node is too high, some tasks may be assigned to other nodes to balance the task distribution. Repeat process: The above steps will be repeated until the number of partitions of all tables is rebalanced.

优选的,参考图3所示,还包括:Preferably, as shown in Figure 3, it also includes:

S7:将所述N个片段复制P份,分配到M-1个工作节点中。增加复制P份,即是增加副本数。副本集可以提高数据库的可用性和性能,因为当一个节点故障时,其他节点可以接管其工作。同时,副本集还可以提高数据的冗余性和安全性,因为任何节点都可以提供数据的副本。但是,副本数的增加也会增加存储容量和计算资源的需求。并且,副本会被分配到M-1个工作节点中,目的是为了保证N个片段的副本的各自的安全性和冗余性。S7: Copy P copies of the N fragments and distribute them to M-1 working nodes. To increase the number of copies by P is to increase the number of copies. Replica sets can improve the availability and performance of the database because when one node fails, other nodes can take over its work. At the same time, replica sets can also improve data redundancy and security, because any node can provide a copy of the data. However, an increase in the number of replicas also increases storage capacity and computing resource requirements. In addition, the replicas will be allocated to M-1 working nodes, in order to ensure the respective security and redundancy of the replicas of the N fragments.

优选的,参考图4所示,所述将数据划分为N个片段,包括:Preferably, as shown in FIG. 4, the data is divided into N segments, including:

S11:创建分布式索引;S11: Create a distributed index;

S12:根据分布式索引的结构,将表分为多个片段;在表上进行索引创建操作,创建分布式索引。根据索引的结构,将表分成多个片段。每个片段的大小是固定的,可以根据实际需求进行设置。S12: Divide the table into multiple fragments according to the structure of the distributed index; perform an index creation operation on the table to create a distributed index. Divide the table into fragments based on the structure of the indexes. The size of each fragment is fixed and can be set according to actual needs.

S13:将大数据量遥感影像数据划分为N个片段,并加入到对应的分为多个片段的表中。将数据页加载到对应的片段中。数据页是分片的核心数据结构,每个数据页都包含一定数量的行数据。在数据页插入、更新、删除时,需要更新片段的状态。S13: Divide the large amount of remote sensing image data into N segments, and add them to the corresponding table divided into multiple segments. Load the data page into the corresponding fragment. The data page is the core data structure of the shard, and each data page contains a certain number of row data. When the data page is inserted, updated, or deleted, the state of the fragment needs to be updated.

优选的,所述分布式索引,包括:分布式B-Tree和/或哈希索引。这些索引可以跨多个节点进行分区。这使得本实施例能够处理具有大量行和复杂几何对象的表。对空间表进行水平分区,从而使得不同的空间数据可以存储在不同的节点上。这使得本实施例能够加速空间查询,并支持更大的数据集。可以自动将查询并行处理在多个节点上,提高空间数据操作的并行执行能力,使查询和分析更加高效。提供了水平扩展的能力,可以轻松地扩展到多个节点上,使得整个系统可以轻松地扩展以应对更多的数据需求。可以采用空间哈希的方式将数据分散到不同的节点上。在空间遥感元数据的存储方面,可以利用空间哈希算法将数据按照空间坐标进行哈希,从而将相邻的数据划分到同一个节点上。这样可以提高查询效率,同时减少网络开销,提高整个系统的性能。Preferably, the distributed index includes: distributed B-Tree and/or hash index. These indexes can be partitioned across multiple nodes. This enables this embodiment to handle tables with large numbers of rows and complex geometric objects. Horizontally partition the spatial table so that different spatial data can be stored on different nodes. This enables this embodiment to speed up spatial queries and support larger datasets. It can automatically process queries in parallel on multiple nodes, improve the parallel execution capability of spatial data operations, and make query and analysis more efficient. Provides the capability of horizontal expansion, which can be easily extended to multiple nodes, so that the entire system can be easily expanded to cope with more data requirements. The data can be distributed to different nodes by means of spatial hashing. In the storage of spatial remote sensing metadata, the spatial hash algorithm can be used to hash the data according to the spatial coordinates, so that adjacent data can be divided into the same node. This can improve query efficiency while reducing network overhead and improving overall system performance.

第二方面,参考图5所示,还提供一种大数据量遥感影像数据存储装置1,包括:In the second aspect, as shown in FIG. 5 , there is also provided a remote sensing image data storage device 1 with a large amount of data, including:

第一划分模块11:用于将大数据量遥感影像数据划分为N个片段;The first division module 11: for dividing the remote sensing image data with a large amount of data into N segments;

第二划分模块12:用于将存储空间划分为M个节点;The second division module 12: for dividing the storage space into M nodes;

设置模块13:用于将所述M个节点中的一个节点设置为配置节点,剩余的M-1个节点设置为存储大数据量遥感影像数据的工作节点;Setting module 13: used to set one of the M nodes as a configuration node, and set the remaining M-1 nodes as working nodes for storing large amounts of remote sensing image data;

第一存储模块14:用于利用配置节点将N个片段的数据存储到对应的存储数据的M-1个工作节点。The first storage module 14 : used to store the data of N fragments to M−1 working nodes corresponding to the storage data by using the configuration node.

优选的,参考图6所示,还包括:Preferably, as shown in Figure 6, it also includes:

增加模块15:用于增加K个工作节点;Add module 15: used to add K working nodes;

第二存储模块16:用于配置节点分配增加工作节点后的M+K-1个节点的数据存储。The second storage module 16: used for configuring the data storage of the node allocation and adding M+K-1 nodes after the working nodes are added.

优选的,参考图7所示,还包括:Preferably, as shown in Figure 7, it also includes:

复制模块17:用于将所述N个片段复制P份,分配到M-1个工作节点中。Copying module 17: for copying P copies of the N fragments and distributing them to M-1 working nodes.

优选的,参考图8所示,所述第一划分模块11,包括:Preferably, as shown in FIG. 8, the first division module 11 includes:

创建模块111:用于创建分布式索引;Creation module 111: for creating a distributed index;

分配模块112:用于根据分布式索引的结构,将表分为多个片段;Allocation module 112: for dividing the table into multiple fragments according to the structure of the distributed index;

第三存储模块113:用于将大数据量遥感影像数据划分为N个片段,并加入到对应的分为多个片段的表中。The third storage module 113: used to divide the large amount of remote sensing image data into N segments, and add them to the corresponding table divided into multiple segments.

优选的,所述分布式索引,包括:分布式B-Tree和/或哈希索引。Preferably, the distributed index includes: distributed B-Tree and/or hash index.

综上所述,本申请包括以下至少一种有益技术效果:In summary, the present application includes at least one of the following beneficial technical effects:

1.可以支持大规模数据的高效存储。在空间遥感元数据的存储方面,可以将数据水平分片,并将不同的数据分散存储在不同的节点上,从而避免单点故障和数据倾斜问题,提高整个系统的可靠性和可扩展性。1. It can support efficient storage of large-scale data. In terms of storage of spatial remote sensing metadata, data can be horizontally sharded and different data can be distributed and stored on different nodes, so as to avoid single point of failure and data skew, and improve the reliability and scalability of the entire system.

2.可以采用空间哈希的方式将数据分散到不同的节点上。在空间遥感元数据的存储方面,可以利用空间哈希算法将数据按照空间坐标进行哈希,从而将相邻的数据划分到同一个节点上。这样可以提高查询效率,同时减少网络开销,提高整个系统的性能。2. The data can be distributed to different nodes by means of spatial hashing. In the storage of spatial remote sensing metadata, the spatial hash algorithm can be used to hash the data according to the spatial coordinates, so that adjacent data can be divided into the same node. This can improve query efficiency while reducing network overhead and improving overall system performance.

3.可以根据数据量的增长实时调整节点数目和数据分片,从而保证系统的稳定性和可靠性。3. The number of nodes and data fragmentation can be adjusted in real time according to the growth of data volume, so as to ensure the stability and reliability of the system.

以上均为本申请的较佳实施例,并非依此限制本申请的保护范围,本说明书(包括摘要和附图)中公开的任一特征,除非特别叙述,均可被其他等效或者具有类似目的的替代特征加以替换。即,除非特别叙述,每个特征只是一系列等效或类似特征中的一个例子而已。All of the above are preferred embodiments of the application, and are not intended to limit the scope of protection of the application. Any feature disclosed in this specification (including abstracts and drawings), unless specifically stated, can be used by other equivalent or similar Alternative features for the purpose are replaced. That is, unless expressly stated otherwise, each feature is one example only of a series of equivalent or similar features.

Claims (10)

1.一种大数据量遥感影像数据的存储方法,其特征在于,包括:1. A storage method for large amount of remote sensing image data, characterized in that it comprises: 将所述数据划分为N个片段;dividing the data into N segments; 将存储空间划分为M个节点;Divide the storage space into M nodes; 将所述M个节点中的一个节点设置为配置节点,剩余的M-1个节点设置为存储大数据量遥感影像数据的工作节点;Setting one of the M nodes as a configuration node, and setting the remaining M-1 nodes as working nodes for storing large amounts of remote sensing image data; 利用配置节点将N个片段的所述数据存储到对应的存储大数据量遥感影像数据的M-1个工作节点。The configuration nodes are used to store the data of the N fragments to the corresponding M-1 working nodes that store large amount of remote sensing image data. 2.根据权利要求1所述的方法,其特征在于,还包括:2. The method according to claim 1, further comprising: 增加K个工作节点;Add K working nodes; 配置节点分配增加工作节点后的M+K-1个节点的数据存储。Configure nodes to allocate data storage for M+K-1 nodes after adding working nodes. 3.根据权利要求1所述的方法,其特征在于,还包括:3. The method according to claim 1, further comprising: 将所述N个片段复制P份,分配到M-1个工作节点中。Copy P copies of the N fragments and distribute them to M-1 working nodes. 4.根据权利要求1所述的方法,其特征在于,所述将数据划分为N个片段,包括:4. The method according to claim 1, wherein said dividing data into N segments comprises: 创建分布式索引;Create a distributed index; 根据分布式索引的结构,将表分为多个片段;According to the structure of the distributed index, the table is divided into multiple fragments; 将所述大数据量遥感影像数据划分为N个片段,并加入到对应的分为多个片段的表中。The remote sensing image data with a large amount of data is divided into N segments, and added to the corresponding table divided into multiple segments. 5.根据权利要求4所述的方法,其特征在于,所述分布式索引,包括:分布式B-Tree和/或哈希索引。5. The method according to claim 4, wherein the distributed index comprises: a distributed B-Tree and/or a hash index. 6.一种大数据量遥感影像数据的存储装置,其特征在于,包括:6. A storage device for large amount of remote sensing image data, characterized in that it comprises: 第一划分模块:用于将大数据量遥感影像数据的数据划分为N个片段;The first division module: used to divide the data of the large amount of remote sensing image data into N segments; 第二划分模块:用于将存储空间划分为M个节点;The second division module: for dividing the storage space into M nodes; 设置模块:用于将所述M个节点中的一个节点设置为配置节点,剩余的M-1个节点设置为存储大数据量遥感影像数据的工作节点;Setting module: used to set one of the M nodes as a configuration node, and set the remaining M-1 nodes as working nodes for storing large amounts of remote sensing image data; 第一存储模块:用于利用配置节点将N个片段的数据存储到对应的存储大数据量遥感影像数据的M-1个工作节点。The first storage module: used for using the configuration node to store the data of N fragments to the corresponding M-1 working nodes that store large amount of remote sensing image data. 7.根据权利要求6所述的装置,其特征在于,还包括:7. The device according to claim 6, further comprising: 增加模块:用于增加K个工作节点;Add module: used to add K working nodes; 第二存储模块:用于配置节点分配增加工作节点后的M+K-1个节点的数据存储。The second storage module: it is used to configure the node to allocate data storage of M+K-1 nodes after adding working nodes. 8.根据权利要求6所述的装置,其特征在于,还包括:8. The device according to claim 6, further comprising: 复制模块:用于将所述N个片段复制P份,分配到M-1个工作节点中。Copying module: for copying P copies of the N fragments and distributing them to M-1 working nodes. 9.根据权利要求6所述的装置,其特征在于,所述第一划分模块,包括:创建模块:用于创建分布式索引;9. The device according to claim 6, wherein the first division module comprises: a creation module: used to create a distributed index; 分配模块:用于根据分布式索引的结构,将表分为多个片段;Allocation module: used to divide the table into multiple fragments according to the structure of the distributed index; 第三存储模块:用于将大数据量遥感影像数据划分为N个片段,并加入到对应的分为多个片段的表中。The third storage module: used to divide the large amount of remote sensing image data into N segments, and add them to the corresponding table divided into multiple segments. 10.根据权利要求9所述的装置,其特征在于,所述分布式索引,包括:分布式B-Tree和/或哈希索引。10. The device according to claim 9, wherein the distributed index comprises: a distributed B-Tree and/or a hash index.
CN202310667083.9A 2023-06-06 2023-06-06 Method and device for storing large-data-volume remote sensing image data Pending CN116644202A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310667083.9A CN116644202A (en) 2023-06-06 2023-06-06 Method and device for storing large-data-volume remote sensing image data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310667083.9A CN116644202A (en) 2023-06-06 2023-06-06 Method and device for storing large-data-volume remote sensing image data

Publications (1)

Publication Number Publication Date
CN116644202A true CN116644202A (en) 2023-08-25

Family

ID=87624496

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310667083.9A Pending CN116644202A (en) 2023-06-06 2023-06-06 Method and device for storing large-data-volume remote sensing image data

Country Status (1)

Country Link
CN (1) CN116644202A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200210421A1 (en) * 2018-12-29 2020-07-02 Wuhan University Method of storing remote sensing big data in hbase database
CN113778341A (en) * 2021-09-17 2021-12-10 北京航天泰坦科技股份有限公司 Distributed storage method and device for remote sensing data and remote sensing data reading method
CN114338718A (en) * 2021-12-21 2022-04-12 浙江大学 Distributed storage method, device and medium for massive remote sensing data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200210421A1 (en) * 2018-12-29 2020-07-02 Wuhan University Method of storing remote sensing big data in hbase database
CN113778341A (en) * 2021-09-17 2021-12-10 北京航天泰坦科技股份有限公司 Distributed storage method and device for remote sensing data and remote sensing data reading method
CN114338718A (en) * 2021-12-21 2022-04-12 浙江大学 Distributed storage method, device and medium for massive remote sensing data

Similar Documents

Publication Publication Date Title
Chu et al. Distributed data deduplication
US10002148B2 (en) Memory-aware joins based in a database cluster
US9442673B2 (en) Method and apparatus for storing data using a data mapping algorithm
EP3314477B1 (en) Systems and methods for parallelizing hash-based operators in smp databases
US8543621B2 (en) Database partitioning by virtual partitions
US8140498B2 (en) Distributed database system by sharing or replicating the meta information on memory caches
CN108600321A (en) A kind of diagram data storage method and system based on distributed memory cloud
US8682874B2 (en) Information processing system
EP3260995A1 (en) Clustering layers in multi-node clusters
US9800575B1 (en) Assigning storage responsibility in a distributed data storage system with replication
US11095715B2 (en) Assigning storage responsibility in a distributed data storage system with replication
CN104516967A (en) Electric power system mass data management system and use method thereof
CN104750757B (en) A kind of date storage method and equipment based on HBase
US10509803B2 (en) System and method of using replication for additional semantically defined partitioning
CN110147407A (en) A kind of data processing method, device and Database Administration Server
CN111767287A (en) Data import method, device, device and computer storage medium
CN107590257A (en) A kind of data base management method and device
US20170270149A1 (en) Database systems with re-ordered replicas and methods of accessing and backing up databases
US12216622B2 (en) Supporting multiple fingerprint formats for data file segment
US20080288563A1 (en) Allocation and redistribution of data among storage devices
Liroz-Gistau et al. Dynamic workload-based partitioning algorithms for continuously growing databases
US9292559B2 (en) Data distribution/retrieval using multi-dimensional index
CN111309260B (en) Data storage node selection method
US8819017B2 (en) Affinitizing datasets based on efficient query processing
CN116644202A (en) Method and device for storing large-data-volume remote sensing image data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240223

Address after: Room 105, 1st Floor, Building 5, No. 8 Dongbei Wangxi Road, Haidian District, Beijing, 100193

Applicant after: Yizhirui Information Technology Co.,Ltd.

Country or region after: China

Address before: 601, Unit 6, 3rd Floor, No. 25 Shangdi East Road, Haidian District, Beijing, 100089

Applicant before: Beijing Jietai Yunji Information Technology Co.,Ltd.

Country or region before: China

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20230825

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载