Background technique
The spatial index of magnanimity spatio-temporal data is that the retrieval and inquisition characteristic of magnanimity spatio-temporal data is combined to construct
Spatial data index.According in the continuity and basic query process for retrieving data in geographical space dimension and sampling time dimension
The features such as unicity in data class dimension, to magnanimity spatio-temporal data according to data class dimension, geographical space dimension,
Hyperspace in sampling time dimension carries out data division operation, constructs spatio-temporal data fragment.It was inquired according to basis
Unicity and spatio-temporal data statistical property in journey in data class dimension, by same data class and it is same year
All data carry out data subregion, calculate the minimum circumscribed rectangle of local data, construct the key attribute of spatio-temporal data fragment
Vector.Finally, the key attribute vector that the spatio-temporal data fragment of all regional areas is merged by collecting establishes global R
Set spatial data index.In summary process, the spatial data index of magnanimity spatio-temporal data is a kind of based on spatial data
The single-stage R tree spatial data index of object fragment.Why second level R tree space number is constructed in fragment not in spatial data object
It is because the data scale of magnanimity spatio-temporal data is huge and search domain is indefinite according to index.During retrieval and inquisition, by
Limited in the data volume of memory cache, the data fragmentation where inquiring can not enduringly reside in the memory of distributed system,
If constructing second level R tree space in spatial data object fragment, when cache invalidation, retrieval and inquisition process necessarily refers to R tree space
The serializing and unserializing of index increase the cost of data load and distribution, and the cost of subsequent query process depends on
Specific algorithm realization, therefore, the spatial data index of magnanimity spatio-temporal data, the tissue shape indexed with single-stage space-time data
Formula design.
Summary of the invention
The purpose of the present invention is to provide ocean geography Spatial Multi-Dimensional time-varying field data distribution formula tissues and query processing side
Method, the beneficial effects of the invention are as follows ocean geography Spatial Multi-Dimensional time-varying field data distribution formula tissue and inquiry processing method are simply high
Effect, improves operation efficiency.
The technical scheme adopted by the invention is that following the steps below:
1) data class and sampling time point, division spatio-temporal data handled according to spatio-temporal data obtains list
One data class, the spatio-temporal data original document in single year;
2) it is calculated single using distributed memory system data block size as partition size parameter according to partition size parameter
The number of partitions of one data class, the spatio-temporal data in single year;
3) design of Index Algorithm between the distributed space based on R tree.
Further, in step 2), the longitude and latitude grid and layer depth of spatio-temporal data are divided according to the number of partitions, calculate subregion side
Single kind, the segmentation of the spatio-temporal data in single year are generated spatio-temporal data slicing files, are saved in distribution by boundary
Storage system completes spatio-temporal data subregion process.
Further, the design of Index Algorithm includes spatial data object fragment between distributed space of the step 3) based on R tree, i.e.,
Spatio-temporal data slicing files design the Data Structures of R tree node, distribution statistics according to spatial data object fragment
The spatial interval information of spatial object fragment, collection space block information, centralization building space-time data index;
R tree spatial data index needs to introduce the minimum circumscribed rectangle of a multidimensional, the key attribute of spatio-temporal data to
Amount includes longitude, dimension, five depth, sampling time point and data class dimensions, according in spatio-temporal data slicing files
Data establish the minimum circumscribed rectangle of one five dimension, i.e. the spatial interval information of spatial data object fragment, as R tree node
Minimum circumscribed rectangle uses accessed path of the spatio-temporal data slicing files in storage platform as the data of R tree node
Index description, comprehensive minimum circumscribed rectangle and data directory description constitute the index information description of spatial data object fragment;Point
Cloth counts the stage, right in distributed computing system using a spatio-temporal data slicing files as a data subregion
Each dimension maximizing and minimum value of spatio-temporal data key vector in data subregion, summarize in current data subregion
The maximum value and minimum value of each dimension, five dimensions for generating the spatio-temporal data slicing files where current data subregion are minimum
Boundary rectangle;It collects and centralized building stage, host node collects the spatio-temporal data fragment generated in the distributed statistics stage
File path of five the dimension minimum circumscribed rectangles and spatio-temporal data slicing files of file on distributed memory system, is established
The index information of spatial data object fragment describes collection, and each index information description of traversal index information description collection will index
Information describes to be inserted into R tree as the spatial data object of R tree level, at the end of traversing insertion process, completes R tree space
The building process of data directory, algorithm terminate.
Specific embodiment
The present invention is described in detail With reference to embodiment.
Subregion, the i.e. design of slicing files size rely on specific distributed memory system.It is retouched behind use of the present invention
Distributed memory system of the magnanimity spatio-temporal data distributed storage platform stated as magnanimity spatio-temporal data, therefore,
The size of spatio-temporal data slicing files by concrete scheme implement in magnanimity spatio-temporal data Distributed Storage platform
Data block size determine.It is big much smaller than the actual blocks of data of distributed memory system in spatio-temporal data slicing files size
In the case where small, the small documents problem of distributed memory system occurs therewith, influences the efficiency and extension of distributed memory system
Property;In the case where spatio-temporal data slicing files size is greater than the actual data block size of distributed memory system, one
It is executed in query process on a spatio-temporal data fragment, data retrieval enquiry module reads multiple from distributed memory system
Data block scans more extraneous datas, reduces the efficiency of data retrieval inquiry.
The design of scoping rules, i.e. Regionalization basis depends on the basic data structure of spatio-temporal data spatial data object
With the retrieval and inquisition characteristic of spatio-temporal data processing.Firstly, since data class in the query process of spatio-temporal data basis
There are monistic feature in dimension, the data of the value vector of the spatial data object in same spatio-temporal data slicing files
Genre dimension is identical.Secondly, the sampling time of the retrieval and inquisition handled according to spatio-temporal data, there are successional spies in section
Point advanced optimizes spatio-temporal data partition scheme, and polymerization sampling year is identical in same spatio-temporal data slicing files
Spatio-temporal data.Finally, spatio-temporal data tissue in the form of geographical space grid data in geographical space dimension
Storage, therefore, when the spatio-temporal data original document to same data class, same year carries out data division operation,
The number of partitions is calculated using the size and partition size of original document, same data class, same year sea are divided according to the number of partitions
The longitude and latitude grid and layer depth of foreign space-time data calculate the partition boundaries of each spatio-temporal data slicing files, use subregion
Boundary segmentation original document generates spatio-temporal data slicing files, and the spatio-temporal data slicing files of generation are saved in point
In cloth storage system.
Spatio-temporal data zoning design scheme, can take following steps:
1) data class and sampling time point, division spatio-temporal data handled according to spatio-temporal data obtains list
One data class, the spatio-temporal data original document in single year;
2) it is calculated single using distributed memory system data block size as partition size parameter according to partition size parameter
The number of partitions of one data class, the spatio-temporal data in single year.
The longitude and latitude grid and layer depth of spatio-temporal data are divided according to the number of partitions, partition boundaries are calculated, by single kind, list
The spatio-temporal data segmentation in one year generates spatio-temporal data slicing files, is saved in distributed memory system, completes sea
Foreign space-time data subregion process.
3) design of Index Algorithm between the distributed space based on R tree
Spatial data object fragment, i.e. spatio-temporal data slicing files.R burl is designed according to spatial data object fragment
The Data Structures of point, the spatial interval information of distributed statistical space object fragment, collection space block information, centralization
Construct space-time data index.
R tree spatial data index needs to introduce the minimum circumscribed rectangle of a multidimensional, the key attribute of spatio-temporal data to
Therefore amount establishes the R tree of spatio-temporal data comprising five longitude, dimension, depth, sampling time point and data class dimensions
Spatial data index needs to establish the minimum circumscribed rectangle of one five dimension according to data in spatio-temporal data slicing files, i.e.,
The spatial interval information of spatial data object fragment uses spatio-temporal data fragment as the minimum circumscribed rectangle of R tree node
Accessed path of the file in storage platform is described as the data directory of R tree node, comprehensive minimum circumscribed rectangle and data rope
Draw the index information description that description constitutes spatial data object fragment.
The distributed statistics stage is counted using a spatio-temporal data slicing files as a data subregion in distribution
To each dimension maximizing and minimum value of the spatio-temporal data key vector in data subregion in calculation system, in current number
Summarize the maximum value and minimum value of each dimension according to subregion, generates the spatio-temporal data slicing files where current data subregion
Five dimension minimum circumscribed rectangles.
It collects and centralized building stage, host node collects the spatio-temporal data fragment text generated in the distributed statistics stage
File path of five the dimension minimum circumscribed rectangles and spatio-temporal data slicing files of part on distributed memory system, is established empty
Between data object fragment index information describe collection.The each index information description for traversing index information description collection, index is believed
Breath description is inserted into R tree as the spatial data object of R tree level, at the end of traversing insertion process, completes R tree space number
According to the building process of index, algorithm terminates.
The above is only not to make limit in any form to the present invention to better embodiment of the invention
System, any simple modification that embodiment of above is made according to the technical essence of the invention, equivalent variations and modification,
Belong in the range of technical solution of the present invention.