+

CN109241236A - Distributed organization and query processing method of marine geospatial multidimensional time-varying field data - Google Patents

Distributed organization and query processing method of marine geospatial multidimensional time-varying field data Download PDF

Info

Publication number
CN109241236A
CN109241236A CN201811200131.9A CN201811200131A CN109241236A CN 109241236 A CN109241236 A CN 109241236A CN 201811200131 A CN201811200131 A CN 201811200131A CN 109241236 A CN109241236 A CN 109241236A
Authority
CN
China
Prior art keywords
data
spatio
spatial
temporal
temporal data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811200131.9A
Other languages
Chinese (zh)
Inventor
秦勃
夏海涛
王云鹏
张书尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ocean University of China
Original Assignee
Ocean University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ocean University of China filed Critical Ocean University of China
Priority to CN201811200131.9A priority Critical patent/CN109241236A/en
Publication of CN109241236A publication Critical patent/CN109241236A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了海洋地理空间多维时变场数据分布式组织与查询处理方法,根据海洋时空数据处理的数据种类和采样时间点,划分海洋时空数据,得到单一数据种类、单一年度的海洋时空数据原始文件;将分布式存储系统数据块大小作为分区大小参量,根据分区大小参量,计算单一数据种类、单一年度的海洋时空数据的分区数;基于R树的分布式空间索引算法的设计。本发明的有益效果是海洋地理空间多维时变场数据分布式组织与查询处理方法简单高效,提高了运算效率。The invention discloses a distributed organization and query processing method for marine geospatial multi-dimensional time-varying field data, which divides marine spatiotemporal data according to the data types and sampling time points processed by marine spatiotemporal data, and obtains the original marine spatiotemporal data of a single data type and a single year. file; take the data block size of the distributed storage system as the partition size parameter, and calculate the number of partitions of marine spatiotemporal data of a single data type and a single year according to the partition size parameter; the design of a distributed spatial index algorithm based on R tree. The beneficial effects of the invention are that the distributed organization and query processing method of marine geographic space multi-dimensional time-varying field data is simple and efficient, and the operation efficiency is improved.

Description

Ocean geography Spatial Multi-Dimensional time-varying field data distribution formula tissue and inquiry processing method
Technical field
The invention belongs to technical field of data processing, are related to a kind of ocean geography Spatial Multi-Dimensional time-varying field data distribution formula group It knits and inquiry processing method.
Background technique
The spatial index of magnanimity spatio-temporal data is that the retrieval and inquisition characteristic of magnanimity spatio-temporal data is combined to construct Spatial data index.According in the continuity and basic query process for retrieving data in geographical space dimension and sampling time dimension The features such as unicity in data class dimension, to magnanimity spatio-temporal data according to data class dimension, geographical space dimension, Hyperspace in sampling time dimension carries out data division operation, constructs spatio-temporal data fragment.It was inquired according to basis Unicity and spatio-temporal data statistical property in journey in data class dimension, by same data class and it is same year All data carry out data subregion, calculate the minimum circumscribed rectangle of local data, construct the key attribute of spatio-temporal data fragment Vector.Finally, the key attribute vector that the spatio-temporal data fragment of all regional areas is merged by collecting establishes global R Set spatial data index.In summary process, the spatial data index of magnanimity spatio-temporal data is a kind of based on spatial data The single-stage R tree spatial data index of object fragment.Why second level R tree space number is constructed in fragment not in spatial data object It is because the data scale of magnanimity spatio-temporal data is huge and search domain is indefinite according to index.During retrieval and inquisition, by Limited in the data volume of memory cache, the data fragmentation where inquiring can not enduringly reside in the memory of distributed system, If constructing second level R tree space in spatial data object fragment, when cache invalidation, retrieval and inquisition process necessarily refers to R tree space The serializing and unserializing of index increase the cost of data load and distribution, and the cost of subsequent query process depends on Specific algorithm realization, therefore, the spatial data index of magnanimity spatio-temporal data, the tissue shape indexed with single-stage space-time data Formula design.
Summary of the invention
The purpose of the present invention is to provide ocean geography Spatial Multi-Dimensional time-varying field data distribution formula tissues and query processing side Method, the beneficial effects of the invention are as follows ocean geography Spatial Multi-Dimensional time-varying field data distribution formula tissue and inquiry processing method are simply high Effect, improves operation efficiency.
The technical scheme adopted by the invention is that following the steps below:
1) data class and sampling time point, division spatio-temporal data handled according to spatio-temporal data obtains list One data class, the spatio-temporal data original document in single year;
2) it is calculated single using distributed memory system data block size as partition size parameter according to partition size parameter The number of partitions of one data class, the spatio-temporal data in single year;
3) design of Index Algorithm between the distributed space based on R tree.
Further, in step 2), the longitude and latitude grid and layer depth of spatio-temporal data are divided according to the number of partitions, calculate subregion side Single kind, the segmentation of the spatio-temporal data in single year are generated spatio-temporal data slicing files, are saved in distribution by boundary Storage system completes spatio-temporal data subregion process.
Further, the design of Index Algorithm includes spatial data object fragment between distributed space of the step 3) based on R tree, i.e., Spatio-temporal data slicing files design the Data Structures of R tree node, distribution statistics according to spatial data object fragment The spatial interval information of spatial object fragment, collection space block information, centralization building space-time data index;
R tree spatial data index needs to introduce the minimum circumscribed rectangle of a multidimensional, the key attribute of spatio-temporal data to Amount includes longitude, dimension, five depth, sampling time point and data class dimensions, according in spatio-temporal data slicing files Data establish the minimum circumscribed rectangle of one five dimension, i.e. the spatial interval information of spatial data object fragment, as R tree node Minimum circumscribed rectangle uses accessed path of the spatio-temporal data slicing files in storage platform as the data of R tree node Index description, comprehensive minimum circumscribed rectangle and data directory description constitute the index information description of spatial data object fragment;Point Cloth counts the stage, right in distributed computing system using a spatio-temporal data slicing files as a data subregion Each dimension maximizing and minimum value of spatio-temporal data key vector in data subregion, summarize in current data subregion The maximum value and minimum value of each dimension, five dimensions for generating the spatio-temporal data slicing files where current data subregion are minimum Boundary rectangle;It collects and centralized building stage, host node collects the spatio-temporal data fragment generated in the distributed statistics stage File path of five the dimension minimum circumscribed rectangles and spatio-temporal data slicing files of file on distributed memory system, is established The index information of spatial data object fragment describes collection, and each index information description of traversal index information description collection will index Information describes to be inserted into R tree as the spatial data object of R tree level, at the end of traversing insertion process, completes R tree space The building process of data directory, algorithm terminate.
Specific embodiment
The present invention is described in detail With reference to embodiment.
Subregion, the i.e. design of slicing files size rely on specific distributed memory system.It is retouched behind use of the present invention Distributed memory system of the magnanimity spatio-temporal data distributed storage platform stated as magnanimity spatio-temporal data, therefore, The size of spatio-temporal data slicing files by concrete scheme implement in magnanimity spatio-temporal data Distributed Storage platform Data block size determine.It is big much smaller than the actual blocks of data of distributed memory system in spatio-temporal data slicing files size In the case where small, the small documents problem of distributed memory system occurs therewith, influences the efficiency and extension of distributed memory system Property;In the case where spatio-temporal data slicing files size is greater than the actual data block size of distributed memory system, one It is executed in query process on a spatio-temporal data fragment, data retrieval enquiry module reads multiple from distributed memory system Data block scans more extraneous datas, reduces the efficiency of data retrieval inquiry.
The design of scoping rules, i.e. Regionalization basis depends on the basic data structure of spatio-temporal data spatial data object With the retrieval and inquisition characteristic of spatio-temporal data processing.Firstly, since data class in the query process of spatio-temporal data basis There are monistic feature in dimension, the data of the value vector of the spatial data object in same spatio-temporal data slicing files Genre dimension is identical.Secondly, the sampling time of the retrieval and inquisition handled according to spatio-temporal data, there are successional spies in section Point advanced optimizes spatio-temporal data partition scheme, and polymerization sampling year is identical in same spatio-temporal data slicing files Spatio-temporal data.Finally, spatio-temporal data tissue in the form of geographical space grid data in geographical space dimension Storage, therefore, when the spatio-temporal data original document to same data class, same year carries out data division operation, The number of partitions is calculated using the size and partition size of original document, same data class, same year sea are divided according to the number of partitions The longitude and latitude grid and layer depth of foreign space-time data calculate the partition boundaries of each spatio-temporal data slicing files, use subregion Boundary segmentation original document generates spatio-temporal data slicing files, and the spatio-temporal data slicing files of generation are saved in point In cloth storage system.
Spatio-temporal data zoning design scheme, can take following steps:
1) data class and sampling time point, division spatio-temporal data handled according to spatio-temporal data obtains list One data class, the spatio-temporal data original document in single year;
2) it is calculated single using distributed memory system data block size as partition size parameter according to partition size parameter The number of partitions of one data class, the spatio-temporal data in single year.
The longitude and latitude grid and layer depth of spatio-temporal data are divided according to the number of partitions, partition boundaries are calculated, by single kind, list The spatio-temporal data segmentation in one year generates spatio-temporal data slicing files, is saved in distributed memory system, completes sea Foreign space-time data subregion process.
3) design of Index Algorithm between the distributed space based on R tree
Spatial data object fragment, i.e. spatio-temporal data slicing files.R burl is designed according to spatial data object fragment The Data Structures of point, the spatial interval information of distributed statistical space object fragment, collection space block information, centralization Construct space-time data index.
R tree spatial data index needs to introduce the minimum circumscribed rectangle of a multidimensional, the key attribute of spatio-temporal data to Therefore amount establishes the R tree of spatio-temporal data comprising five longitude, dimension, depth, sampling time point and data class dimensions Spatial data index needs to establish the minimum circumscribed rectangle of one five dimension according to data in spatio-temporal data slicing files, i.e., The spatial interval information of spatial data object fragment uses spatio-temporal data fragment as the minimum circumscribed rectangle of R tree node Accessed path of the file in storage platform is described as the data directory of R tree node, comprehensive minimum circumscribed rectangle and data rope Draw the index information description that description constitutes spatial data object fragment.
The distributed statistics stage is counted using a spatio-temporal data slicing files as a data subregion in distribution To each dimension maximizing and minimum value of the spatio-temporal data key vector in data subregion in calculation system, in current number Summarize the maximum value and minimum value of each dimension according to subregion, generates the spatio-temporal data slicing files where current data subregion Five dimension minimum circumscribed rectangles.
It collects and centralized building stage, host node collects the spatio-temporal data fragment text generated in the distributed statistics stage File path of five the dimension minimum circumscribed rectangles and spatio-temporal data slicing files of part on distributed memory system, is established empty Between data object fragment index information describe collection.The each index information description for traversing index information description collection, index is believed Breath description is inserted into R tree as the spatial data object of R tree level, at the end of traversing insertion process, completes R tree space number According to the building process of index, algorithm terminates.
The above is only not to make limit in any form to the present invention to better embodiment of the invention System, any simple modification that embodiment of above is made according to the technical essence of the invention, equivalent variations and modification, Belong in the range of technical solution of the present invention.

Claims (3)

1. ocean geography Spatial Multi-Dimensional time-varying field data distribution formula tissue and inquiry processing method, it is characterised in that according to following step It is rapid to carry out:
1) data class and sampling time point, division spatio-temporal data handled according to spatio-temporal data obtains single number According to type, the spatio-temporal data original document in single year;
2) single number is calculated according to partition size parameter using distributed memory system data block size as partition size parameter According to type, the number of partitions of the spatio-temporal data in single year;
3) design of Index Algorithm between the distributed space based on R tree.
2. according to ocean geography Spatial Multi-Dimensional time-varying field data distribution formula tissue and inquiry processing method described in claim 1, It is characterized in that: in the step 2), the longitude and latitude grid and layer depth of spatio-temporal data is divided according to the number of partitions, calculates subregion side Single kind, the segmentation of the spatio-temporal data in single year are generated spatio-temporal data slicing files, are saved in distribution by boundary Storage system completes spatio-temporal data subregion process.
3. according to ocean geography Spatial Multi-Dimensional time-varying field data distribution formula tissue and inquiry processing method described in claim 1, Be characterized in that: the design of Index Algorithm includes spatial data object fragment between distributed space of the step 3) based on R tree, i.e., extra large Foreign space-time data slicing files, the Data Structures of R tree node are designed according to spatial data object fragment, and distribution statistics is empty Between object fragment spatial interval information, collection space block information, centralization building space-time data index;
R tree spatial data index needs to introduce the minimum circumscribed rectangle of a multidimensional, the key attribute vector packet of spatio-temporal data Containing five longitude, dimension, depth, sampling time point and data class dimensions, according to data in spatio-temporal data slicing files Establish the minimum circumscribed rectangle of one five dimension, i.e. the spatial interval information of spatial data object fragment, the minimum as R tree node Boundary rectangle uses accessed path of the spatio-temporal data slicing files in storage platform as the data directory of R tree node Description, comprehensive minimum circumscribed rectangle and data directory description constitute the index information description of spatial data object fragment;It is distributed The statistics stage, using a spatio-temporal data slicing files as a data subregion, to data in distributed computing system Each dimension maximizing and minimum value of spatio-temporal data key vector in subregion, summarize each in current data subregion The maximum value and minimum value of dimension, five dimensions for generating the spatio-temporal data slicing files where current data subregion are minimum external Rectangle;It collects and centralized building stage, host node collects the spatio-temporal data slicing files generated in the distributed statistics stage Five file paths on distributed memory system of dimension minimum circumscribed rectangles and spatio-temporal data slicing files, establish space The index information of data object fragment describes collection, each index information description of traversal index information description collection, by index information It describes to be inserted into R tree as the spatial data object of R tree level, at the end of traversing insertion process, completes R tree spatial data The building process of index, algorithm terminate.
CN201811200131.9A 2018-10-16 2018-10-16 Distributed organization and query processing method of marine geospatial multidimensional time-varying field data Pending CN109241236A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811200131.9A CN109241236A (en) 2018-10-16 2018-10-16 Distributed organization and query processing method of marine geospatial multidimensional time-varying field data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811200131.9A CN109241236A (en) 2018-10-16 2018-10-16 Distributed organization and query processing method of marine geospatial multidimensional time-varying field data

Publications (1)

Publication Number Publication Date
CN109241236A true CN109241236A (en) 2019-01-18

Family

ID=65052293

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811200131.9A Pending CN109241236A (en) 2018-10-16 2018-10-16 Distributed organization and query processing method of marine geospatial multidimensional time-varying field data

Country Status (1)

Country Link
CN (1) CN109241236A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110377624A (en) * 2019-07-31 2019-10-25 象辑知源(武汉)科技有限公司 A kind of storage and querying method to the geographic information data with multidimensional properties such as time and spaces
CN110532255A (en) * 2019-05-20 2019-12-03 南京大学 The storage and retrieval of a kind of space-time data based on three-dimensional R tree and update method
CN111078634A (en) * 2019-12-30 2020-04-28 中科海拓(无锡)科技有限公司 Distributed spatio-temporal data indexing method based on R tree
CN113901087A (en) * 2021-10-12 2022-01-07 大连海事大学 Pruning method for space big data partition repeated data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103425772A (en) * 2013-08-13 2013-12-04 东北大学 Method for searching massive data with multi-dimensional information
CN105117497A (en) * 2015-09-28 2015-12-02 上海海洋大学 Ocean big data master-slave index system and method based on Spark cloud network
CN106933833A (en) * 2015-12-30 2017-07-07 中国科学院沈阳自动化研究所 A kind of positional information method for quickly querying based on Spatial Data Index Technology

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103425772A (en) * 2013-08-13 2013-12-04 东北大学 Method for searching massive data with multi-dimensional information
CN105117497A (en) * 2015-09-28 2015-12-02 上海海洋大学 Ocean big data master-slave index system and method based on Spark cloud network
CN106933833A (en) * 2015-12-30 2017-07-07 中国科学院沈阳自动化研究所 A kind of positional information method for quickly querying based on Spatial Data Index Technology

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532255A (en) * 2019-05-20 2019-12-03 南京大学 The storage and retrieval of a kind of space-time data based on three-dimensional R tree and update method
CN110377624A (en) * 2019-07-31 2019-10-25 象辑知源(武汉)科技有限公司 A kind of storage and querying method to the geographic information data with multidimensional properties such as time and spaces
CN111078634A (en) * 2019-12-30 2020-04-28 中科海拓(无锡)科技有限公司 Distributed spatio-temporal data indexing method based on R tree
CN111078634B (en) * 2019-12-30 2023-07-25 中科海拓(无锡)科技有限公司 Distributed space-time data indexing method based on R tree
CN113901087A (en) * 2021-10-12 2022-01-07 大连海事大学 Pruning method for space big data partition repeated data
CN113901087B (en) * 2021-10-12 2024-05-10 大连海事大学 Pruning method for space big data partition repeated data

Similar Documents

Publication Publication Date Title
CN109241236A (en) Distributed organization and query processing method of marine geospatial multidimensional time-varying field data
KR101117709B1 (en) A method for multi-dimensional histograms using a minimal skew cover in a space partitioning tree and recording medium storing program for executing the same
CN103425772B (en) A kind of mass data inquiry method with multidimensional information
CN107798054B (en) A Trie-based range query method and device
CN110457315A (en) A method and system for analyzing group aggregation patterns based on user trajectory data
CN109635068A (en) Mass remote sensing data high-efficiency tissue and method for quickly retrieving under cloud computing environment
CN110297952B (en) Grid index-based parallelization high-speed railway survey data retrieval method
CN102289466A (en) K-nearest neighbor searching method based on regional coverage
US10019649B2 (en) Point cloud simplification
JP2014002519A (en) Spatiotemporal data management system, spatiotemporal data management method, and spatiotemporal data management program
CN108205562B (en) Positioning data storage and retrieval method and device for geographic information system
CN108182242A (en) A kind of indexing means for the inquiry of magnanimity multi dimensional numerical data area
CN112214485B (en) Power grid resource data organization and planning method based on global subdivision grid
CN109063194A (en) Data retrieval method and device based on space encoding
CN108446357A (en) A kind of mass data spatial dimension querying method based on two-dimentional geographical location
CN111813778B (en) Approximate keyword storage and query method for large-scale road network data
Han et al. Spatial keyword range search on trajectories
CN117033541B (en) Space-time knowledge graph indexing method and related equipment
CN113255610B (en) Feature base building method, feature retrieval method and related device
Vlachou et al. Efficient spatio-temporal RDF query processing in large dynamic knowledge bases
CN113076334A (en) Data query method, index generation device and electronic equipment
CN119760160B (en) A method for processing spatiotemporal graph data based on grid graph database
CN106095852A (en) Efficient query method for activity track
CN115438081A (en) Multi-stage aggregation and real-time updating method for massive ship position point clouds
CN110955656A (en) Vector data topological operation index optimization mechanism and construction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190118

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载