Space-time keyword query method in mixed storage block chain environment
Technical Field
The invention belongs to the technical field of computer block chain query, and particularly relates to a space-time keyword query method in a mixed storage block chain environment.
Background
In recent years, with the increasing popularity of digital cryptocurrency, the underlying technology, namely blockchain technology, has become the focus of current social attention. Essentially, the blockchain can be regarded as a distributed database, has the characteristics of decentralization, non-tampering, traceability, data transparency and the like, and can well solve the trust problem among a plurality of participants in a supply chain and other scenes. The space-time data is a common data expression form in the fields of supply chains, the Internet of things and the like, and a large amount of space-time data is stored on the blockchain along with the application of the blockchain in the scenes. The space-time keyword query uses time, space and keywords as query conditions, and searches all information meeting the conditions, so that the space-time keyword query is widely applied to the above scenes.
The current commonly used blockchain storage mode is a mixed storage mode of a chain up-chain and a chain down-chain, operation transactions which can be shared by multiple parties are stored on the chain, and private data of all parties is stored under the chain. When the space-time keyword query is performed in this mode, the result after the link-up and link-down data connection needs to be returned. Because most of the existing query methods use B+ tree indexes in blocks, the query efficiency is lower for space-time keyword query with multi-dimensional search space. In addition, the existing method needs to read all data in the link table under the chain for connection during inquiry, so that a large amount of communication overhead is caused, and user experience is affected. Therefore, the research on the space-time keyword query method in the mixed storage block chain environment has important significance.
Taking a real scene as an example, in a supply chain scene, a plurality of participants store production information, logistics information, sales information and the like of the production information, the logistics information, the sales information and the like of the production information, the logistics information and the sales information are stored in a blockchain in a public transparent manner for multiparty data sharing and collaborative processing, and private data of each participant is stored under the chain. When the user sends the space-time keyword query Q shown in the figure, the transactions meeting the query conditions under the chain are required to be returned to the querying user after being connected through the connection attribute. The existing query method needs to query respectively according to the B+ tree index of each attribute column in the block, intersections are taken from query results to obtain all transactions meeting query conditions, then all data in the link table under the chain are read to be connected to the blockchain, and finally the connected results are returned to the query user. The drawbacks of this query approach are mainly the following three aspects:
first, query semantics are weaker. While transactions on a blockchain contain a variety of attributes, existing blockchain platforms typically store transactions in the form of key-value or bespoke semantics. The key-value mode does not support relational query such as selection, connection and the like, and can not query the transaction according to the attribute name, and the semantic-given storage mode needs to construct an independent index for each attribute column, and can not merge indexes according to attribute types.
Second, the query efficiency is low. The on-chain transactions include attributes such as time, space, and keywords, distributed among the blocks. Most of the existing blockchain query methods avoid traversing all transactions by using B+ tree indexes in the blocks, but the structure has low query efficiency for space-time keyword query with multidimensional search space.
Third, communication overhead is large. Because the data on the link and the data under the link need to be acquired simultaneously during the query, the existing method needs to read all the data in the link table under the link for connection, and when the amount of the data under the link is very large and the result of the connection is small, a great deal of unnecessary communication overhead is generated.
Disclosure of Invention
Aiming at the defects of the prior art, the invention designs a space-time keyword query method under a mixed storage block chain environment.
A space-time keyword query method under a mixed storage block chain environment comprises the following specific steps:
Step 1, constructing a block chain model which is CSBM classified according to attributes and endowed with semantics;
The method comprises the steps of 1.1, CSBM comprises a plurality of blocks CS-B classified according to attributes and endowed with semantics, CSBM=CS-B 1+CS-B2+···+CS-Bn, each CS-B comprises a Block head and a Block CS-B body classified according to attributes and endowed with semantics, wherein CS-B=B head+CS-B body, B head has the same structure as a traditional Block chain Block head and comprises a front Block hash value PrevHash for connecting blocks into a chain, a Block Height for recording the position of a current Block on the chain, a Timestamp for recording the generation time of the Block, mining difficulty Bits and random numbers Nonce, merkle Root in the Block head is replaced by a Root node BKM-tree Root of a BKM-tree constructed in step 2, all transactions in the Block are generated based on hash values of transactions in the Block, and the transactions in the CS-B body comprise attribute types and are added;
Defining T x as a transaction ,Tx={Key=v1,Time={time=v2},Coordinate={longitude=v3,latitude=v4},Keywords={kw1,kw2,···,kwn},Others={oth1,oth2,···,othn}},Key classified according to attributes and endowed with semantics in CSBM as a main key of the transaction, calculating a hash value of the transaction, and using a 16-system character string with the length of 64 to represent the transaction, wherein Time is a Time attribute type, time is a timestamp of the transaction, coordinatate is a space attribute type, longitude is the longitude of the transaction, latitude is the latitude of the transaction, v i is an attribute value, keywords is a keyword attribute type, kw i is an attribute name of a certain keyword of the transaction, others is other attribute types, oth i is names of other attributes of the transaction, and setting different attribute names according to different application occasions and transaction types;
Step 2, constructing an on-chain two-stage index structure based on a B 2 F-BKM structure, wherein the structure consists of two parts, namely an inter-block B 2 F-tree and an intra-block BKM-tree, and indexing all blocks and transactions is completed;
Step 2.1, constructing BKM tree in each block to replace original Merkle tree, extracting all the space attribute type values and key attribute type values of the transactions in the block in the process of constructing new blocks, simulating KD tree structure, dividing the region according to the space attribute data of the transactions until each region only contains one transaction, storing the transactions in leaf nodes only, storing information related to data division in non-leaf nodes, adding bloom filter fields into KD tree, constructing bloom filter according to the key attribute data of the transactions, including all key words of child nodes in bloom filter of non-leaf nodes, calculating hash values of each node from bottom to top according to the construction mode of Merkle tree to form BKM-tree, and recording root node in block head;
Step 2.2, constructing a B 2 F-tree among blocks based on Block numbers, block creation time and keywords in the blocks, acquiring new Block data, wherein the new Block data comprises Block numbers of the new blocks, block-id, block creation time Timestamp, bloom filters bloomfilter-bf formed by the keywords in the blocks and address offsets in the Block chain files, constructing B 2 F-tree nodes, wherein the formats of keys in the leaf nodes are Block-id, timestamp and bf, the Block-id is from a Block Height at a Block head, the Timestamp is from Timestamp at the Block head and bf is the same as bloom filters in BKM-tree root nodes in the blocks, and the pointers in the leaf nodes record the address offsets of the blocks in the Block chain files;
Step 2.3, constructing a chain two-stage index structure B 2 F-BKM structure;
Step 3, designing a method for inquiring space-time keywords under the upper chain of a chain, acquiring inquiry conditions, inquiring the conditions on the chain according to a B 2 F-BKM structure of a two-stage index structure on the chain, namely, B 2 F-tree and BKM-tree, extracting connection attribute values of inquired matters, inquiring under the upper chain of the chain according to the values, and returning a final result;
Defining that space-time keyword query in a mixed storage blockchain environment is composed of four tuples, wherein Q= [ [ T s,Te ], S, W, join-attr ], wherein [ T s,Te ] is a time range of user query, S is a space range of user query, W is a keyword data set of user query and comprises a plurality of keywords, join-attr is a connection condition in the form of an equation, the left side of the equation is a keyword attribute name to be connected on a chain, and the right side of the equation is a connection keyword attribute name in a database table under the chain;
Traversing B 2 F-tree according to time and keyword condition in query Q, selecting branch pointer according to time condition based on 'timestamp, bf' portion of key in B 2 F-tree node, judging whether the node pointed by said pointer contains query keyword W according to bloom filter, if so, continuing traversing, otherwise, stopping, according to said method until leaf node, finding out all keys containing query keyword W and time is greater than or equal to 'T s', less than or equal to 'T e' and obtaining the positions of these blocks by pointer;
Step 3.3, after a specific block is queried, traversing BKM-tree in the block in turn according to space and keyword conditions in query Q, selecting subnodes from top to bottom according to a search mode of KD-tree, judging whether all query keywords are contained in bloom filters of the subnodes, traversing the subnodes if the subnodes are contained, and discarding the branch if the subnodes are not contained;
Step 3.4, extracting the value of the connection attribute of original data in Onchain-RS according to the connection condition join-attr in the query Q, querying the data in the under-chain database table according to the value, adding the under-chain result set Offchain-RS, reading the data in Offchain-RS and Onchain-RS, connecting the under-chain data through the connection attribute, and storing the result in the result set ResultSet;
Returning ResultSet, terminating the current calculation and waiting for the next call;
The invention has the beneficial technical effects that:
according to the invention, when the space-time keyword query is performed on the block chain based on the chain up-chain and down-chain hybrid storage mode, the query efficiency is low by applying the existing method, a large amount of unnecessary communication overhead can be generated, and the user experience is affected; therefore, the invention aims to provide a space-time keyword query method in a mixed storage block chain environment, which improves the query efficiency and reduces the communication overhead compared with the traditional query method. The method has the following specific beneficial technical effects:
1. The method provided by the invention processes the space-time keyword query problem under the mixed storage blockchain environment based on the blockchain model CSBM classified according to the attribute and endowed with the semantic, and can realize the efficient space-time keyword query under the mixed storage blockchain environment.
2. The space-time keyword query method constructs an on-chain two-stage index structure based on a B 2 F-BKM structure, the structure consists of two parts, namely an inter-block B 2 F-tree and an intra-block BKM-tree, and indexes of all blocks and transactions are completed.
3. According to the block chain connection query method facing the hybrid storage mode, the link condition query is performed by traversing the B 2 F-BKM structure, and then the link data connection query is performed on the link according to the connection attribute value.
Drawings
FIG. 1 is a schematic diagram of a supply chain scene in a space-time keyword query method in a hybrid storage blockchain environment, wherein the left side is a blockchain model CSBM classified according to attributes on a chain and provided with semantics, and the right side is an under-chain database;
FIG. 2 is a schematic diagram of a structure of a block CS-B classified by attribute and provided with semantics in a space-time keyword query method in a hybrid memory blockchain environment according to an embodiment of the present invention;
Fig. 3 is a schematic diagram of the overall structure of a two-level index structure B 2 F-BKM structure on a chain in a space-time keyword query method in a hybrid memory blockchain environment according to an embodiment of the present invention;
FIG. 4 is a flowchart of a structure of B 2 F-BKM for constructing a two-level index structure on a chain in a space-time keyword query method in a mixed storage blockchain environment according to an embodiment of the present invention;
FIG. 5 is a flowchart of a method for querying space-time keywords in a chain up-link and a chain down-link in a method for querying space-time keywords in a mixed storage block chain environment according to an embodiment of the invention;
FIG. 6 is a diagram illustrating an example query process of a space-time keyword query method under a chain-up and chain-down in a space-time keyword query method in a hybrid memory blockchain environment according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the drawings and examples;
Aiming at the problem of space-time keyword query of a chain up-chain and down-chain hybrid storage block chain, the invention provides a space-time keyword query method in a hybrid storage block chain environment. Firstly, aiming at weak query semantics, a blockchain model (Categorical AND SEMANTIC Blockchain Model, CSBM) which is classified by attributes and endows semantics is constructed, attribute types are divided for transactions in a block, and semantics are added. Secondly, aiming at low query efficiency, an on-chain two-stage index structure based on a B 2 F-BKM structure is constructed, and the structure can index all blocks and transactions, so that the query efficiency is effectively improved. Finally, a method for querying space-time keywords under the uplink of the chain is designed aiming at larger communication cost, the on-chain conditional query is performed by traversing the B 2 F-BKM structure, and then the data connection query under the uplink of the chain is performed according to the connection attribute value, so that unnecessary communication cost is reduced compared with the traditional query method.
A space-time keyword query method under a mixed storage block chain environment comprises the following specific steps:
Step 1, constructing a blockchain model (Categorical AND SEMANTIC Blockchain Model, CSBM) classified by attributes and endowed with semantics, wherein the structure is shown in the left part of the figure 1.
Step 1.1 CSBM contains a plurality of blocks classified by attribute and assigned semantics (CS-B), csbm=cs-B 1+CS-B2+···+CS-Bn, each CS-B contains two parts, a block header (B head) and a block classified by attribute and assigned semantics (CS-B body), the structure is shown in fig. 2. CS-b=b head+cs-B body. The Bhead has the same structure as the Block head of the traditional Block chain, and comprises a front Block hash value PrevHash for connecting blocks into a chain, a Block Height for recording the position of the current Block on the chain, a time stamp for recording the generation time of the Block, mining difficulty Bits and random numbers Nonce, wherein Merkle Root (Merkle Root) in the Block head is replaced by a Root node (BKM-tree Root) of the BKM-tree constructed in the step 2, and the hash value based on the transaction in the Block is generated. All transactions within the block are contained in the CS-B body, attribute types are divided for the transactions and semantics are added;
Step 1.2, defining T x as a transaction ,Tx={Key=v1,Time={time=v2},Coordinate={longitude=v3,latitude=v4},Keywords={kw1,kw2,···,kwn},Others={oth1,oth2,···,othn}},Key classified according to attributes and endowed with semantics in CSBM as a main key of the transaction, obtaining by calculating a hash value of the transaction, and using a 16-ary character string with the length of 64 to represent the transaction, wherein Time is a Time attribute type, time is a timestamp of the transaction, coordinates is a space attribute type, longitude is longitude of the transaction, latitude is latitude of the transaction, v i is an attribute value, keywords is a keyword attribute type, kw i is an attribute name of a certain keyword of the transaction, others is other attribute types, and oth i is names of other attributes of the transaction. Setting different attribute names according to different application occasions and transaction types;
In this example, transaction Keywords = { operation, good, factory } for recording production information, and transaction Keywords = { operation, good, factory, driver } for recording logistics information. Taking transaction T j+1 in fig. 2 as an example, the main key of the transaction is '8 c..9B', and a piece of logistics information is recorded, wherein at the time of 'T 2, the driver Tom transports goods 2 produced by the factory 2 to a (134.3°e,42.5°n) position';
And 2, constructing an on-chain two-stage index structure based on a B 2 F-BKM structure, wherein the structure is shown in figure 3. The structure consists of two parts, namely an inter-block B 2 F-tree and an intra-block BKM-tree, and the indexing of all blocks and transactions is completed, and the construction flow is shown in figure 4, and the specific process is as follows:
And 2.1, constructing a BKM-tree in each block to replace the original Merkle tree. In the process of constructing a new block, all transaction space attribute type values and keyword attribute type values in the block are extracted. Simulating a KD-tree structure, dividing regions according to the space attribute data of the transactions until each region only contains one transaction, wherein the transactions are only stored in leaf nodes, and non-leaf nodes store information related to data division; adding a bloom filter field into the KD-tree, constructing a bloom filter according to the key word attribute data of the transaction, wherein the bloom filter of the non-leaf node contains all key words of child nodes of the non-leaf node;
In this example, as shown on the right side of fig. 3, BKM-trees are built for all transactions in block n, for example, the root node contains the hash value of the node, spatial partition related information, and a bloom filter BF 7, which contains the key data for all transactions in block n. The leaf node P j+1 records the spatial information (134.3°e,42.5°n), the Key information (transport 2,goods2, tom) and the Key value (8 c..9b) of the transaction T j+1, and the hash value of this node is calculated from the above information in series hash.
And 2.2, constructing an inter-block B 2 F-tree based on the block number, the block creation time and the intra-block keywords. New block data is obtained, including the block number (block-id) of the new block, the block creation time (timestamp), the bloom filter (bloomfilter-bf) made up of intra-block keys, and the address offset of the block in the blockchain file. The B 2 F-tree node is constructed, and the format of the keys in the leaf nodes is 'Block-id, timestamp, bf', wherein the Block-id is from the Block Height (Block Height) at the Block head, the Timestamp is from the Timestamp (Timestamp) at the Block head, and bf is the same as the bloom filter in the BKM-tree root node in the Block. Pointers in leaf nodes record the address offsets of the blocks located in the blockchain file. All keywords of subtrees of the non-leaf nodes are contained in a bloom filter of the non-leaf nodes and are used for filtering pointers, judging whether the nodes pointed by the pointers contain corresponding keywords or not, and updating a B 2 F-tree according to a block-id and a B+ tree node insertion mode, wherein the structure is shown in figure 3;
In this example, as shown in FIG. 3, a B 2 F-tree is constructed for all blocks, e.g., the root node's bloom filter BF 5 contains all the keys in its subtree bloom filter BF 1,BF2, and the leaf node ((1, ts 1,bf1),p1) indicates that the first block was created at ts 1, and the key data of the intra-block transaction constitutes the location where bloom filter BF 1,p1 points to the block.
Step 2.3, constructing a chain two-stage index structure B 2 F-BKM structure;
And 3, designing a method for querying space-time keywords under the upper chain of the chain. Acquiring query conditions, carrying out on-chain conditional query according to a B 2 F-BKM structure of the on-chain two-stage index structure, namely a B 2 F-tree and a BKM-tree, extracting a connection attribute value of a queried transaction, carrying out on-chain and off-chain connection query according to the value, and returning a final result. The flow is shown in fig. 5, the example query process is shown in fig. 6, and the specific process is as follows:
Defining that space-time keyword query in a mixed storage blockchain environment is composed of four tuples, wherein Q= [ [ T s,Te ], S, W, join-attr ], wherein [ T s,Te ] is a time range of user query, S is a space range of user query, W is a keyword data set of user query and comprises a plurality of keywords, join-attr is a connection condition in the form of an equation, the left side of the equation is a keyword attribute name to be connected on a chain, and the right side of the equation is a connection keyword attribute name in a database table under the chain;
In this example, query q= [ [ t 7,t9],[S],[goods=goods2, operator=transport ], driver=driver.
And 3.2, traversing the B 2 F-tree according to the time and the keyword condition in the query Q. Based on the 'timestamp, bf' part of the keys in the B 2 F-tree node, starting from the root node, selecting a branch pointer according to time conditions, judging whether the node pointed by the pointer contains a query keyword W according to a bloom filter, and continuing traversing if the node contains the query keyword W, otherwise, stopping traversing;
In this example, the location information of the block is obtained by traversing the B 2 F-tree to obtain that only the block with the block number 9 meets the query time condition and includes all the query keywords.
And 3.3, after a specific block is inquired, traversing the BKM-tree positioned in the block in sequence according to the space and the keyword condition in the inquiry Q. And selecting the child node from top to bottom according to the searching mode of the KD-tree, judging whether all query keywords are contained in the bloom filter of the child node, traversing the child node if the query keywords are contained, and discarding the branch if the query keywords are not contained. According to this method, all transactions within the block that meet the space and key conditions are found up to the leaf node. Obtaining Key data in leaf nodes, namely hash values (Key) of transactions, obtaining original data according to the Key data, and adding the data into a result set (Onchain-RS) on a chain if a time condition is met;
In this example, the original data T j and T j+1 are found by traversing the BKM-tree in block 9 to obtain that nodes P j and P j+1 meet space and keyword query conditions, then reading the hash values Key j and Key j+1 stored in nodes P j and P j+1, finding that the original data meet time conditions, and adding the original data of transactions T j and T j+1 to Onchain-RS.
And 3.4, extracting the value of the connection attribute of the original data in Onchain-RS according to the connection condition join-attr in the query Q, querying the data in the database table under the chain according to the value, and adding an under-chain result set (Offchain-RS). After the data in Offchain-RS and Onchain-RS are read and connected with the data under the chain through the connection attribute, the result is stored in a result set ResultSet, in the example, the values Tom and James of the connection attribute Driver are obtained according to the original data, the data about the drivers Tom and James are searched in the under-chain Driver table, then the on-chain data are connected with the under-chain data through the connection attribute value, the results R1 and R2 are obtained, and the results are stored in a result set ResultSet.
And 3.5, returning ResultSet, terminating the current calculation and waiting for the next call.