+

CN102609490B - Column-storage-oriented B+ tree index method for DWMS (data warehouse management system) - Google Patents

Column-storage-oriented B+ tree index method for DWMS (data warehouse management system) Download PDF

Info

Publication number
CN102609490B
CN102609490B CN201210019935.5A CN201210019935A CN102609490B CN 102609490 B CN102609490 B CN 102609490B CN 201210019935 A CN201210019935 A CN 201210019935A CN 102609490 B CN102609490 B CN 102609490B
Authority
CN
China
Prior art keywords
data
index
tree
column
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201210019935.5A
Other languages
Chinese (zh)
Other versions
CN102609490A (en
Inventor
夏小玲
乐嘉锦
王梅
李晔锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Donghua University
Original Assignee
Donghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Donghua University filed Critical Donghua University
Priority to CN201210019935.5A priority Critical patent/CN102609490B/en
Publication of CN102609490A publication Critical patent/CN102609490A/en
Application granted granted Critical
Publication of CN102609490B publication Critical patent/CN102609490B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明涉及一种面向列存储DWMS的B+树索引方法,其特征在于,步骤为:步骤1、列数据生成;步骤2、若B+树关键字为行号,则转向步骤4进行创建,否则转步骤3进行排序;步骤3、将多路归并和堆排序组合使用对列值数据进行排序;步骤4、B+树索引初始化;步骤5、创建叶子结点;步骤6、自底向上产生中间结点。本发明提供一种应用于列存储DWMS的B+树索引具有如下优点:1)保证B树层次最短,减少了查找次数;2)B+树的建立抛弃了传统的插入法,使用自底向上构造B+树的方法。使用这种方法不需要考虑分裂操作,减少了大量的开销。

The present invention relates to a B+ tree index method for column storage DWMS, characterized in that the steps are: step 1, column data generation; step 2, if the B+ tree key is row number, turn to step 4 to create, otherwise turn to Step 3 sorting; step 3, combining multi-way merge and heap sorting to sort column value data; step 4, B+ tree index initialization; step 5, creating leaf nodes; step 6, generating intermediate nodes from bottom to top . The present invention provides a B+ tree index applied to column storage DWMS, which has the following advantages: 1) the B tree level is guaranteed to be the shortest, reducing the number of searches; tree method. Using this method does not need to consider the split operation, which reduces a lot of overhead.

Description

A kind of B+ tree indexing means towards row storage DWMS
Technical field
The present invention relates to a kind of tree of the B+ towards row storage DWMS (Data Warehouse Management System) index technology.
Background technology
The high development of internet, applications, high-throughput and large buffer memory become the nowadays essential feature of database product, day by day urgent for issued transaction and the high performance requirement of query analysis.Traditional row stored data base can not be for business decision provides efficient query analysis as processing transactions application, the database schema of row storage is in recent years by re-examine, for the inquiry type work of reading to optimize in data warehouse or in analysis support application, row storage list reveals than row to be stored and has performance more significantly.Because relation table is in external performance, be still logical line, to be therefore connected with multilist be to be listed as the key factor that affects query performance in storing in tuple reconstruct.Index is one of important technology improving search efficiency.B+ tree index can keep data to store in order, and have advantages of allow to search, sequential access, insertion and deletion, make it in transaction environment, become the most popular index structure of Database Systems.
In traditional Database Systems, the variation of the B+ tree construction that data insertion and modification bring is frequently very large.In order to reduce the possibility that causes node split when data are inserted and revised, the node of B+ tree is not what fill up.But in data warehouse, almost do not have data to insert and retouching operation.Traditional B+tree is applied in the data warehouse of mass data storage and will causes the waste in space.Meanwhile, because node is not filled completely, data storage needs more node, and this will increase the height of B+ tree index, reduces the search efficiency of data.
Summary of the invention
The object of this invention is to provide a kind of B+ tree index that is applied to row storage DWMS, overcome the limitation of traditional B+tree index, improve the search efficiency of data.
In order to achieve the above object, technical scheme of the present invention has been to provide a kind of B+ tree indexing means towards row storage DWMS, it is characterized in that, step is:
Step 1, column data generate: import user data, the original data by row storage are divided vertically into single-row, for each item number of each row is according to according to the line number at its place, add the line number item of rebuilding for tuple, form two tuple (line numbers, train value), request for data section, is kept at the every train value data that newly produce in a data segment;
If step 2 B+ tree key word is line number data, turn to step 4 to create; If B+ tree key word is train value data, goes to step 3 and first train value is sorted;
Step 3, multiway merge and heapsort are used in combination train value data are sorted;
Step 4, the initialization of B+ tree index;
Step 5, establishment B+ leaf node: the leafy node of application B+ tree, data item is filled directly into leafy node and obtains data block, form the 0th layer, B+ tree;
The feature of step 5 is:
According to row storage characteristics, direct save data information in leaf piece, but not the pointer of key assignments and sensing data block.In search procedure, can directly obtain data by leafy node like this, reduce one time I/O; When node is filled, no longer consider sparse coefficient, the each node in tree fills up, and has adopted the principle of " key for searching number is consistent with pointer number ", room for promotion utilization factor.
Step 6, generation intermediate node: iteration is set up the middle layer node of B+ tree until whole B+ tree creates end from the bottom to top.
Preferably, described step 3 comprises:
Step 3.1, initialization: in internal memory, apply for a sequence district, its size is designated as K, and making K is the integral multiple of block size.If pending data is counted Blk_num > K according to piece, the method for multiway merge external sort will be adopted.The size of M while calculating multiway merge, sub-list number when M is merger, order
Figure BDA0000132910760000021
in each sub-list for the treatment of merger, piece is counted D=Blk_num/M.
Source data in step 3.2, the section of reading, the train value Data Division in data block two tuples out, packs dataitem array into, puts into sequence district.Use Heap algorithm using dataitem array as input parameter, dataitem array is sorted; If Blk_num is <=K, sequence finishes; Otherwise, sorted data item is reassembled into piece, write back in interim section.
Step 3.3, the interim section of M is carried out to merge sort.
Preferably, described step 6 comprises:
Step 6.1, generating indexes item, the tlv triple that first train value, the line number of correspondence and the piece of this piece number that index entry is served as reasons in lower one deck piece forms;
Step 6.2, judge that train value type is fixed-length data or elongated data, if fixed length train value, calculate the length of index entry, and then can obtain the index entry number=index block space size/index entry length in index block, the index entry number in the data block number/each node in index block number=0th in ground floor layer.Application index block space, is inserted into index entry in index block in batches; If elongated train value, index block of first to file, then inserts tlv triple successively, until just apply for new index block while failing to lay down;
After step 6.3, one deck have created, turn to step 5.1, create last layer intermediate node according to same process, finish until whole B+ tree creates.
The invention provides a kind of B+ tree index tool that is applied to row storage DWMS has the following advantages:
1) provide improved B+ tree index node, the direct memory row data of leafy node, intermediate node is no longer considered sparse coefficient, all fills up, and has increased space availability ratio.For the data of same number, this structure guarantees that B tree hierachy is the shortest, has reduced and has searched number of times;
2) traditional insertion has been abandoned in the foundation of B+ tree, uses the method for bottom-up structure B+ tree.Make not need to consider in this way splitting operation, reduced a large amount of expenses;
3) line number in row storage system and train value data are set up respectively to B+ tree index, keep data to store in order, be convenient to tuple reconstruct and be connected with multilist.
Accompanying drawing explanation
Fig. 1 is the structure that B+ sets inner node (index entry);
Fig. 2 is the structure of B+ leaf node (data page).
Embodiment
For the present invention is become apparent, be hereby described in detail as follows with a preferred embodiment.
The invention provides a kind of B+ tree indexing means towards row storage DWMS, the steps include:
Step 1, column data generate: import user data, the original data by row storage are divided vertically into single-row, for each item number of each row is according to according to the line number at its place, add the line number item of rebuilding for tuple, form two tuple (line numbers, train value), request for data section, is kept at the every train value data that newly produce in a data segment;
If step 2 B+ tree key word is line number data, turn to step 4 to create; If B+ tree key word is train value data, goes to step 3 and first train value is sorted;
Step 3, multiway merge and heapsort are used in combination train value data are carried out to key assignments sequence, this step comprises:
Step 3.1, initialization: in internal memory, apply for a sequence district, its size is designated as K, and making K is the integral multiple of block size.If pending data is counted Blk_num > K according to piece, the method for multiway merge external sort will be adopted.The size of M while calculating multiway merge, sub-list number when M is merger, order in each sub-list for the treatment of merger, piece is counted D=Blk_num/M.
Source data in step 3.2, the section of reading, the train value Data Division in data block two tuples out, packs dataitem array into, puts into sequence district.Use Heap algorithm using dataitem array as input parameter, dataitem array is sorted; If Blk_num is <=K, sequence finishes; Otherwise, sorted data item is reassembled into piece, write back in interim section.
Step 3.3, the interim section of M is carried out to merge sort.
Step 4, B+ tree initialization, specifically comprise the descriptor of B+ tree carried out to assignment, needs theing contents are as follows of assignment:
(1) type of B+ tree index key assignments, comprises fixed/elongated, value type whether.The key assignments of fixed length comprises: SMALLINT, INT, NUMBER, CHAR, DATE, TIME; Elongated key assignments has the VARCHAR of comprising.
(2) information of row, comprises row name, row type, row length, if row are elongated, specifies maximum length.
(3) be B+ tree root piece allocation space.
(4) the level value that B+ tree is set is 0.
Step 5, establishment leafy node: the leafy node of application B+ tree, data item is filled directly into leafy node and obtains data block, form the 0th layer, B+ tree; This step is according to row storage characteristics, direct save data information in leaf piece, but not the pointer of key assignments and sensing data block.In search procedure, can directly obtain data by leafy node like this, reduce one time I/O.In addition, when node is filled, no longer consider sparse coefficient, the each node in tree fills up, room for promotion utilization factor, as shown in Figures 1 and 2.KEY in figure iby < line number, train value > two parts composition.
Step 6, generation intermediate node: iteration is set up the middle layer node of B+ tree until whole B+ tree creates end from the bottom to top, the steps include:
Step 6.1, generating indexes item, the tlv triple that index entry is made up of first train value in lower one deck piece, the line number of correspondence and the piece number of this piece, the length of computation index item, is designated as TL;
Step 6.2, application index block space: judge that according to initialization information in step 4 train value is fixed length or elongated, if fixed length train value calculates the length of index entry.Index entry number=each index block space size/index entry length T L in index block.Calculate the index entry number in the data block number/each index block in index block number=0th in ground floor layer.Apply for the index block space of this layer.Index entry is inserted in index block in batches; If elongated train value, index block of first to file, then inserts tlv triple successively, until just apply for new index block while failing to lay down.
After step 6.3, one deck have created, level=level+1; Turn to step 6.1, create last layer intermediate node according to same process, difference is the index entry number in index block number/each index block of the index block number=lower one deck that calculates current layer.In the time that the index block number in this layer is 1, iteration finishes, and this index block is the root node of B+ tree.

Claims (1)

1.一种面向列存储DWMS的B+树索引方法,其特征在于,步骤为:1. A B+ tree index method for column storage DWMS, characterized in that the steps are: 步骤1、列数据生成:导入用户数据,将原始按行存储的数据垂直划分为单列,为每一列的每一项数据根据其所在的行号,添加用于元组重建的行号项,形成二元组(行号,列值),申请数据段,将新产生的每列值数据保存在一个数据段中;Step 1. Column data generation: import user data, divide the original data stored by row vertically into single columns, add row number items for tuple reconstruction to each item of data in each column according to the row number where it is located, and form Two-tuple (row number, column value), apply for a data segment, and save the newly generated data of each column value in a data segment; 步骤2、若B+树关键字为行号数据,则转向步骤4进行创建;若B+树关键字为列值数据,转步骤3首先对列值进行排序;Step 2. If the B+ tree key is row number data, go to step 4 to create it; if the B+ tree key is column value data, go to step 3 to sort the column values first; 步骤3、将多路归并和堆排序组合使用对列值数据进行排序,所述步骤3包括:Step 3. Combining multi-way merge and heap sorting to sort column value data, said step 3 includes: 步骤3.1、初始化:在内存中申请一个排序区,其大小记为K,令K是待排序的数据块大小的整数倍,若待排序数据块数Blk_num>K,将采用多路归并外排序的方法,归并时的子列表个数为M,令
Figure FDA0000442824400000011
每个待归并的子列表内块数D=Blk_num/M,读取数据段中源数据,把数据块二元组中的列值数据拆分出来,装入数组,放入排序区,使用堆排序算法将数组作为输入参数,对数组进行排序;若Blk_num<=K,则排序结束;否则,把排好序的数据项重新组装成数据块,写回临时段中;
Step 3.1. Initialization: Apply for a sorting area in the memory, and its size is recorded as K. Let K be an integer multiple of the size of the data blocks to be sorted. If the number of data blocks to be sorted Blk_num>K, multi-way merge and external sorting will be used method, the number of sublists when merging is M, let
Figure FDA0000442824400000011
The number of blocks in each sublist to be merged is D=Blk_num/M, read the source data in the data segment, split the column value data in the data block binary group, load it into the array, put it into the sorting area, and use the heap sorting algorithm Use the array as an input parameter to sort the array; if Blk_num<=K, the sorting ends; otherwise, reassemble the sorted data items into data blocks and write them back to the temporary segment;
步骤3.2、对N个临时段进行归并排序;Step 3.2, merging and sorting the N temporary segments; 步骤4、B+树索引初始化;Step 4, B+ tree index initialization; 步骤5、创建B+树叶子结点:申请B+树的叶子结点,将数据项直接填充入叶子结点得到数据块,构成B+树第0层;Step 5. Create B+ tree leaf nodes: Apply for the leaf nodes of the B+ tree, fill the data items directly into the leaf nodes to obtain data blocks, and form the 0th layer of the B+ tree; 步骤6、产生中间结点:由下至上迭代建立B+树的中间层结点直到整棵B+树创建结束,其步骤为:Step 6. Generate intermediate nodes: iteratively build the intermediate layer nodes of the B+ tree from bottom to top until the entire B+ tree is created. The steps are: 步骤6.1、生成索引项,索引项为由下一层索引块中的第一个列值、对应的行号以及该索引块的块号构成的三元组;Step 6.1, generate an index item, the index item is a triplet composed of the first column value in the index block of the next layer, the corresponding row number and the block number of the index block; 步骤6.2、判断列值类型是定长数据或变长数据,若是定长列值,计算得到索引项的长度,进而可得索引块中的索引项个数=索引块空间大小/索引项长度,第一层中的索引块个数=第0层中的数据块个数/每个结点中的索引项个数,申请索引块空间,将索引项批量插入到索引块中;若是变长列值,先申请一个索引块,然后依次插入三元组,直到放不下时才申请新的索引块;Step 6.2. Determine whether the column value type is fixed-length data or variable-length data. If it is a fixed-length column value, calculate the length of the index item, and then obtain the number of index items in the index block = index block space size / index item length, The number of index blocks in the first layer = the number of data blocks in the 0th layer/the number of index items in each node, apply for index block space, and insert index items into index blocks in batches; if it is a variable-length column Value, apply for an index block first, then insert triples in sequence, and apply for a new index block until it cannot fit; 步骤6.3、一层创建完成后,转向步骤6.1,按照同样的过程创建上一层中间结点,直到整棵B+树创建结束。Step 6.3 After the creation of one layer is completed, turn to step 6.1 and follow the same process to create the upper layer of intermediate nodes until the creation of the entire B+ tree is completed.
CN201210019935.5A 2012-01-20 2012-01-20 Column-storage-oriented B+ tree index method for DWMS (data warehouse management system) Expired - Fee Related CN102609490B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210019935.5A CN102609490B (en) 2012-01-20 2012-01-20 Column-storage-oriented B+ tree index method for DWMS (data warehouse management system)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210019935.5A CN102609490B (en) 2012-01-20 2012-01-20 Column-storage-oriented B+ tree index method for DWMS (data warehouse management system)

Publications (2)

Publication Number Publication Date
CN102609490A CN102609490A (en) 2012-07-25
CN102609490B true CN102609490B (en) 2014-07-02

Family

ID=46526862

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210019935.5A Expired - Fee Related CN102609490B (en) 2012-01-20 2012-01-20 Column-storage-oriented B+ tree index method for DWMS (data warehouse management system)

Country Status (1)

Country Link
CN (1) CN102609490B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109284299A (en) * 2015-06-08 2019-01-29 南京航空航天大学 Method for refactoring hybrid indexes with storage awareness

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103870492B (en) * 2012-12-14 2017-08-04 腾讯科技(深圳)有限公司 A kind of date storage method and device based on key row sequence
CN104268146A (en) * 2014-08-21 2015-01-07 南京邮电大学 Static B+-tree index method suitable for analytic applications
CN104601732B (en) * 2015-02-12 2018-01-23 北京金和软件股份有限公司 A kind of quick method for realizing multichannel data merger
CN107066551B (en) * 2017-03-23 2020-04-03 中国科学院计算技术研究所 A row and column storage method and system for tree-like data
CN106980796B (en) * 2017-03-27 2020-03-06 河南科技大学 Multi-domain connection keyword search method based on MDB+ tree in cloud environment
CN107273483B (en) * 2017-06-06 2019-11-05 贵州易鲸捷信息技术有限公司 The access method and system of sparse data
CN108920708B (en) * 2018-07-20 2021-04-27 新华三技术有限公司 Data processing method and device
CN109522271B (en) * 2018-10-22 2021-05-18 郑州云海信息技术有限公司 Batch insertion and deletion method and device for B + tree nodes
CN111275203B (en) * 2020-02-11 2024-10-29 深圳前海微众银行股份有限公司 Decision tree construction method, device, equipment and storage medium based on column storage

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751406B (en) * 2008-12-18 2012-01-04 赵伟 Method and device for realizing column storage based relational database

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109284299A (en) * 2015-06-08 2019-01-29 南京航空航天大学 Method for refactoring hybrid indexes with storage awareness
CN109284299B (en) * 2015-06-08 2021-08-10 南京航空航天大学 Method for reconstructing a hybrid index with storage awareness

Also Published As

Publication number Publication date
CN102609490A (en) 2012-07-25

Similar Documents

Publication Publication Date Title
CN102609490B (en) Column-storage-oriented B+ tree index method for DWMS (data warehouse management system)
CN103745008B (en) A kind of sort method of big data directory
CN104199986B (en) Vector data space index method based on hbase and geohash
CN102737033B (en) Data processing equipment and data processing method thereof
Hjaltason et al. Speeding up construction of PMR quadtree-based spatial indexes
CN102890722B (en) Indexing method applied to time sequence historical database
CN105095520B (en) The distributed memory database indexing means of structure-oriented data
CN102722531B (en) Query method based on regional bitmap indexes in cloud environment
CN101673307B (en) Space data index method and system
CN103425772A (en) Method for searching massive data with multi-dimensional information
WO2006046669A1 (en) Database management device, method and program
CN107273443B (en) A Hybrid Indexing Method Based on Big Data Model Metadata
CN103914483B (en) File memory method, device and file reading, device
CN108009265B (en) A spatial data indexing method in cloud computing environment
CN106991149B (en) A Massive Spatial Object Storage Method Integrating Encoding and Multi-version Data
CN102867066A (en) Data summarization device and data summarization method
JP2023543004A (en) Merge update method, device, and medium for R-tree index based on Hilbert curve
CN103853499A (en) Multi-source data storing and reading method
CN116382588A (en) LSM-Tree storage engine read amplification problem optimization method based on learning index
CN110597805B (en) A method for processing memory index structure
CN104268146A (en) Static B+-tree index method suitable for analytic applications
Sheng et al. Dynamic top-k range reporting in external memory
Roumelis et al. Bulk-loading and bulk-insertion algorithms for xBR+-trees in solid state drives
CN117312486B (en) Dictionary division two-layer structure encryption index creation method supporting quick encryption document ordering retrieval
Carter et al. Nanosecond indexing of graph data with hash maps and VLists

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140702

Termination date: 20170120

CF01 Termination of patent right due to non-payment of annual fee
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载