A kind of B+ tree indexing means towards row storage DWMS
Technical field
The present invention relates to a kind of tree of the B+ towards row storage DWMS (Data Warehouse Management System) index technology.
Background technology
The high development of internet, applications, high-throughput and large buffer memory become the nowadays essential feature of database product, day by day urgent for issued transaction and the high performance requirement of query analysis.Traditional row stored data base can not be for business decision provides efficient query analysis as processing transactions application, the database schema of row storage is in recent years by re-examine, for the inquiry type work of reading to optimize in data warehouse or in analysis support application, row storage list reveals than row to be stored and has performance more significantly.Because relation table is in external performance, be still logical line, to be therefore connected with multilist be to be listed as the key factor that affects query performance in storing in tuple reconstruct.Index is one of important technology improving search efficiency.B+ tree index can keep data to store in order, and have advantages of allow to search, sequential access, insertion and deletion, make it in transaction environment, become the most popular index structure of Database Systems.
In traditional Database Systems, the variation of the B+ tree construction that data insertion and modification bring is frequently very large.In order to reduce the possibility that causes node split when data are inserted and revised, the node of B+ tree is not what fill up.But in data warehouse, almost do not have data to insert and retouching operation.Traditional B+tree is applied in the data warehouse of mass data storage and will causes the waste in space.Meanwhile, because node is not filled completely, data storage needs more node, and this will increase the height of B+ tree index, reduces the search efficiency of data.
Summary of the invention
The object of this invention is to provide a kind of B+ tree index that is applied to row storage DWMS, overcome the limitation of traditional B+tree index, improve the search efficiency of data.
In order to achieve the above object, technical scheme of the present invention has been to provide a kind of B+ tree indexing means towards row storage DWMS, it is characterized in that, step is:
Step 1, column data generate: import user data, the original data by row storage are divided vertically into single-row, for each item number of each row is according to according to the line number at its place, add the line number item of rebuilding for tuple, form two tuple (line numbers, train value), request for data section, is kept at the every train value data that newly produce in a data segment;
If step 2 B+ tree key word is line number data, turn to step 4 to create; If B+ tree key word is train value data, goes to step 3 and first train value is sorted;
Step 3, multiway merge and heapsort are used in combination train value data are sorted;
Step 4, the initialization of B+ tree index;
Step 5, establishment B+ leaf node: the leafy node of application B+ tree, data item is filled directly into leafy node and obtains data block, form the 0th layer, B+ tree;
The feature of step 5 is:
According to row storage characteristics, direct save data information in leaf piece, but not the pointer of key assignments and sensing data block.In search procedure, can directly obtain data by leafy node like this, reduce one time I/O; When node is filled, no longer consider sparse coefficient, the each node in tree fills up, and has adopted the principle of " key for searching number is consistent with pointer number ", room for promotion utilization factor.
Step 6, generation intermediate node: iteration is set up the middle layer node of B+ tree until whole B+ tree creates end from the bottom to top.
Preferably, described step 3 comprises:
Step 3.1, initialization: in internal memory, apply for a sequence district, its size is designated as K, and making K is the integral multiple of block size.If pending data is counted Blk_num > K according to piece, the method for multiway merge external sort will be adopted.The size of M while calculating multiway merge, sub-list number when M is merger, order
in each sub-list for the treatment of merger, piece is counted D=Blk_num/M.
Source data in step 3.2, the section of reading, the train value Data Division in data block two tuples out, packs dataitem array into, puts into sequence district.Use Heap algorithm using dataitem array as input parameter, dataitem array is sorted; If Blk_num is <=K, sequence finishes; Otherwise, sorted data item is reassembled into piece, write back in interim section.
Step 3.3, the interim section of M is carried out to merge sort.
Preferably, described step 6 comprises:
Step 6.1, generating indexes item, the tlv triple that first train value, the line number of correspondence and the piece of this piece number that index entry is served as reasons in lower one deck piece forms;
Step 6.2, judge that train value type is fixed-length data or elongated data, if fixed length train value, calculate the length of index entry, and then can obtain the index entry number=index block space size/index entry length in index block, the index entry number in the data block number/each node in index block number=0th in ground floor layer.Application index block space, is inserted into index entry in index block in batches; If elongated train value, index block of first to file, then inserts tlv triple successively, until just apply for new index block while failing to lay down;
After step 6.3, one deck have created, turn to step 5.1, create last layer intermediate node according to same process, finish until whole B+ tree creates.
The invention provides a kind of B+ tree index tool that is applied to row storage DWMS has the following advantages:
1) provide improved B+ tree index node, the direct memory row data of leafy node, intermediate node is no longer considered sparse coefficient, all fills up, and has increased space availability ratio.For the data of same number, this structure guarantees that B tree hierachy is the shortest, has reduced and has searched number of times;
2) traditional insertion has been abandoned in the foundation of B+ tree, uses the method for bottom-up structure B+ tree.Make not need to consider in this way splitting operation, reduced a large amount of expenses;
3) line number in row storage system and train value data are set up respectively to B+ tree index, keep data to store in order, be convenient to tuple reconstruct and be connected with multilist.
Accompanying drawing explanation
Fig. 1 is the structure that B+ sets inner node (index entry);
Fig. 2 is the structure of B+ leaf node (data page).
Embodiment
For the present invention is become apparent, be hereby described in detail as follows with a preferred embodiment.
The invention provides a kind of B+ tree indexing means towards row storage DWMS, the steps include:
Step 1, column data generate: import user data, the original data by row storage are divided vertically into single-row, for each item number of each row is according to according to the line number at its place, add the line number item of rebuilding for tuple, form two tuple (line numbers, train value), request for data section, is kept at the every train value data that newly produce in a data segment;
If step 2 B+ tree key word is line number data, turn to step 4 to create; If B+ tree key word is train value data, goes to step 3 and first train value is sorted;
Step 3, multiway merge and heapsort are used in combination train value data are carried out to key assignments sequence, this step comprises:
Step 3.1, initialization: in internal memory, apply for a sequence district, its size is designated as K, and making K is the integral multiple of block size.If pending data is counted Blk_num > K according to piece, the method for multiway merge external sort will be adopted.The size of M while calculating multiway merge, sub-list number when M is merger, order
in each sub-list for the treatment of merger, piece is counted D=Blk_num/M.
Source data in step 3.2, the section of reading, the train value Data Division in data block two tuples out, packs dataitem array into, puts into sequence district.Use Heap algorithm using dataitem array as input parameter, dataitem array is sorted; If Blk_num is <=K, sequence finishes; Otherwise, sorted data item is reassembled into piece, write back in interim section.
Step 3.3, the interim section of M is carried out to merge sort.
Step 4, B+ tree initialization, specifically comprise the descriptor of B+ tree carried out to assignment, needs theing contents are as follows of assignment:
(1) type of B+ tree index key assignments, comprises fixed/elongated, value type whether.The key assignments of fixed length comprises: SMALLINT, INT, NUMBER, CHAR, DATE, TIME; Elongated key assignments has the VARCHAR of comprising.
(2) information of row, comprises row name, row type, row length, if row are elongated, specifies maximum length.
(3) be B+ tree root piece allocation space.
(4) the level value that B+ tree is set is 0.
Step 5, establishment leafy node: the leafy node of application B+ tree, data item is filled directly into leafy node and obtains data block, form the 0th layer, B+ tree; This step is according to row storage characteristics, direct save data information in leaf piece, but not the pointer of key assignments and sensing data block.In search procedure, can directly obtain data by leafy node like this, reduce one time I/O.In addition, when node is filled, no longer consider sparse coefficient, the each node in tree fills up, room for promotion utilization factor, as shown in Figures 1 and 2.KEY in figure
iby < line number, train value > two parts composition.
Step 6, generation intermediate node: iteration is set up the middle layer node of B+ tree until whole B+ tree creates end from the bottom to top, the steps include:
Step 6.1, generating indexes item, the tlv triple that index entry is made up of first train value in lower one deck piece, the line number of correspondence and the piece number of this piece, the length of computation index item, is designated as TL;
Step 6.2, application index block space: judge that according to initialization information in step 4 train value is fixed length or elongated, if fixed length train value calculates the length of index entry.Index entry number=each index block space size/index entry length T L in index block.Calculate the index entry number in the data block number/each index block in index block number=0th in ground floor layer.Apply for the index block space of this layer.Index entry is inserted in index block in batches; If elongated train value, index block of first to file, then inserts tlv triple successively, until just apply for new index block while failing to lay down.
After step 6.3, one deck have created, level=level+1; Turn to step 6.1, create last layer intermediate node according to same process, difference is the index entry number in index block number/each index block of the index block number=lower one deck that calculates current layer.In the time that the index block number in this layer is 1, iteration finishes, and this index block is the root node of B+ tree.