CN104361113B

CN104361113B - A kind of OLAP query optimization method under internal memory flash memory mixing memory module

Info

Publication number: CN104361113B
Application number: CN201410717830.6A
Authority: CN
Inventors: 张延松; 张宇; 王珊
Original assignee: Renmin University of China
Current assignee: Renmin University of China
Priority date: 2014-12-01
Filing date: 2014-12-01
Publication date: 2017-06-06
Anticipated expiration: 2034-12-01
Also published as: CN104361113A

Abstract

本发明涉及一种内存‑闪存混合存储模式下的OLAP查询优化方法，其包括：OLAP存储采用flash‑aware的存储模型，在相对较小的DRAM和相对较大flash存储之间按数据访问的局部性进行划分，在异构的两级内存上进行存储优化；内存OLAP采用数组存储，每个属性列存储于连续的数组单元中，将传统的连接操作简化为数组下标访问，进行AIR算法的OLAP查询处理；其中AIR为数组下标访问；将基于数组存储和AIR算法的OLAP查询处理分解为三个顺序的数据访问过程；通过选择向量访问flash存储度量列指定度量值；采用在基于关键字的位图连接索引的基础上在DRAM‑flash两级存储中优化存储K个关键字连接位图，形成二级连接位图索引结构。本发明能提高内存存储性价比、内存和CPU使用效率以及数据存储效率，能广泛应用于通用OLAP应用场景中。The present invention relates to an OLAP query optimization method under a memory-flash memory hybrid storage mode, which comprises: OLAP storage adopts a flash-aware storage model, and accesses parts according to data between a relatively small DRAM and a relatively large flash storage The property is divided, and the storage is optimized on the heterogeneous two-level memory; the memory OLAP adopts array storage, and each attribute column is stored in a continuous array unit, and the traditional connection operation is simplified as an array subscript access, and the AIR algorithm is implemented. OLAP query processing; where AIR is array subscript access; decompose OLAP query processing based on array storage and AIR algorithm into three sequential data access processes; access flash storage metric columns to specify metric values by selecting vectors; use keyword-based On the basis of the bitmap connection index, K keyword connection bitmaps are optimally stored in the DRAM-flash two-level storage to form a two-level connection bitmap index structure. The invention can improve the cost performance of memory storage, memory and CPU usage efficiency and data storage efficiency, and can be widely used in general OLAP application scenarios.

Description

An OLAP query optimization method in memory-flash hybrid storage mode

技术领域technical field

本发明涉及一种数据库领域中存储优化和OLAP(联机分析处理)查询优化方法，特别是关于一种适用于内存数据库一体机平台的以DRAM(动态随机存取存储器)和Flash(闪存)两级存储为基础的内存-闪存混合存储模式下的OLAP查询优化方法。The invention relates to a storage optimization and OLAP (Online Analytical Processing) query optimization method in the field of databases, in particular to a two-level DRAM (Dynamic Random Access Memory) and Flash (flash memory) suitable for the memory database integrated machine platform OLAP query optimization method under storage-based memory-flash hybrid storage mode.

背景技术Background technique

内存分析处理(内存OLAP)是大数据实时分析处理的重要技术，在大内存和多核处理器并行处理能力的支持下，内存OLAP具有优异的实时分析处理能力，但相对于其他存储设备，如flash、磁盘等，内存仍然是非常昂贵的存储介质，而且在存储能耗上比flash高一个数量级(DRAM：～100mW/GB，NAND flash：1-10mW/GB)，内存OLAP分析处理需要以大数据为基础，内存分析处理的硬件成本很高。PCIe Flash Card作为一种大容量(几百GB至TB级存储容量)高速存储技术，已经被大量应用于高性能数据库领域，如Oracle Exadata X3内存数据库一体机配置了大容量的高速闪存卡，并提供Smart Flash Cache缓存热数据，通过数据库逻辑优化缓存算法并能够基于表指定缓存优化策略。高速闪存卡的应用一方面为昂贵的内存存储提供了廉价的二级存储扩展能力，但另一方面高速闪存卡主要用作数据库在flash上的扩展缓存，扩展了内存缓存(buffer)的容量，但并没有与内存OLAP的存储优化及查询处理优化技术结合起来，并没有在OLAP算法层面实现flash-aware的OLAP优化技术。In-memory analysis processing (in-memory OLAP) is an important technology for real-time analysis and processing of big data. With the support of large memory and parallel processing capabilities of multi-core processors, in-memory OLAP has excellent real-time analysis and processing capabilities, but compared to other storage devices, such as flash , disk, etc., memory is still a very expensive storage medium, and its storage energy consumption is an order of magnitude higher than that of flash (DRAM: ~100mW/GB, NAND flash: 1-10mW/GB), memory OLAP analysis and processing need to use big data As a basis, the hardware cost of in-memory analytical processing is high. PCIe Flash Card, as a large-capacity (hundreds of GB to TB-level storage capacity) high-speed storage technology, has been widely used in the field of high-performance databases. Provide Smart Flash Cache to cache hot data, optimize the cache algorithm through database logic, and specify cache optimization strategies based on tables. On the one hand, the application of high-speed flash memory cards provides cheap secondary storage expansion capabilities for expensive memory storage, but on the other hand, high-speed flash memory cards are mainly used as extended caches for databases on flash, expanding the capacity of memory caches (buffers), However, it has not been combined with in-memory OLAP storage optimization and query processing optimization technology, and has not implemented flash-aware OLAP optimization technology at the OLAP algorithm level.

当前的分析型内存数据库主要以DRAM为主存储设备，flash用作替代磁盘的后备存储或磁盘缓存，还没有将flash纳入内存的OLAP算法设计中。如何将内存和大容量flash所形成的二级存储模式应用到内存分析处理领域，使之成为高性能、高性价比的主流平台，并且内存数据库不仅要支持完全内存模式的分析处理，也需要支持DRAM-flash两级存储下的内存分析处理成为亟待解决的技术问题。The current analytical in-memory database mainly uses DRAM as the main storage device, and flash is used as a backup storage or disk cache to replace the disk. Flash has not yet been incorporated into the OLAP algorithm design of the memory. How to apply the secondary storage mode formed by memory and large-capacity flash to the field of memory analysis and processing, making it a high-performance, cost-effective mainstream platform, and the memory database must not only support the analysis and processing of the full memory model, but also need to support DRAM - The memory analysis and processing under the flash two-level storage has become an urgent technical problem to be solved.

发明内容Contents of the invention

针对上述问题，本发明的目的是提供一种内存-闪存混合存储模式下的OLAP查询优化方法，该方法基于DRAM-flash两级存储，通过大容量廉价的flash提高内存存储性价比。同时，该方法能有效提高数据存储效率。进一步，本发明提供的方法可以有效提高内存和CPU使用效率。In view of the above problems, the object of the present invention is to provide an OLAP query optimization method under a memory-flash memory hybrid storage mode. The method is based on DRAM-flash two-level storage, and the cost performance of memory storage is improved through large-capacity and cheap flash. At the same time, the method can effectively improve data storage efficiency. Further, the method provided by the present invention can effectively improve memory and CPU usage efficiency.

为实现上述目的，本发明采取以下技术方案：一种内存-闪存混合存储模式下的OLAP查询优化方法，其包括以下步骤：1)OLAP存储采用flash-aware的存储模型，即根据OLAP星形或雪花形模型中维表较小、谓词操作较多的特点和事实表由外键和度量属性组成的特点，在相对较小的DRAM和相对较大flash存储之间按数据访问的局部性进行划分，在异构的两级内存上进行存储优化；2)内存OLAP采用数组存储，每个属性列存储于连续的数组单元中，表由相同长度的各属性数组组成，维表使用数组下标作为主键，事实表外键为维表中相应记录的数据下标，事实表记录能够根据外键值直接定位维表中对应的数组单元，将传统的连接操作简化为数组下标访问，进行AIR算法的OLAP查询处理；其中AIR为数组下标访问；3)将基于数组存储和AIR算法的OLAP查询处理分解为三个顺序的数据访问过程：维表访问、事实表外键访问和事实表度量属性访问，三个阶段产生的中间数据结构包括：维表过滤分组向量、选择向量和分组向量、分组多维数组；维表过滤分组向量、选择向量和分组向量为各个查询共用的数据结构，不同的查询只需要更新向量的内容，分组多维数组根据查询的不同而动态生成；4)采用在基于关键字的位图连接索引的基础上在DRAM-flash两级存储中优化存储K个关键字连接位图，从中选择n个高频访问关系字并将相应的位图存储于DRAM，其余的位图存储于flash，形成二级连接位图索引结构。In order to achieve the above object, the present invention adopts the following technical solutions: a method for optimizing OLAP queries under a memory-flash memory hybrid storage mode, which includes the following steps: 1) OLAP storage adopts a flash-aware storage model, that is, according to OLAP star or In the snowflake model, the dimension table is small and there are many predicate operations, and the fact table is composed of foreign keys and measurement attributes. It is divided according to the locality of data access between relatively small DRAM and relatively large flash storage. , storage optimization is performed on heterogeneous two-level memory; 2) In-memory OLAP uses array storage, and each attribute column is stored in a continuous array unit. The table is composed of attribute arrays of the same length, and the dimension table uses array subscripts as The primary key, the foreign key of the fact table is the data subscript of the corresponding record in the dimension table, and the fact table record can directly locate the corresponding array unit in the dimension table according to the foreign key value, simplifying the traditional connection operation to array subscript access, and performing the AIR algorithm OLAP query processing; where AIR is array subscript access; 3) OLAP query processing based on array storage and AIR algorithm is decomposed into three sequential data access processes: dimension table access, fact table foreign key access and fact table measurement attributes Access, the intermediate data structures generated in three stages include: dimension table filtering grouping vector, selection vector and grouping vector, grouping multidimensional array; dimension table filtering grouping vector, selection vector and grouping vector are common data structures for each query, different queries It only needs to update the content of the vector, and the grouped multidimensional array is dynamically generated according to different queries; 4) Based on the keyword-based bitmap connection index, K keyword connection bitmaps are optimally stored in the DRAM-flash two-level storage , select n high-frequency access relation words and store the corresponding bitmaps in DRAM, and store the remaining bitmaps in flash to form a two-level connection bitmap index structure.

所述步骤1)中，采用内存存储引擎，维表驻留DRAM；事实表中的外键是多维分析中星形连接时访问频度较高的列，同样驻留在DRAM中；事实表的度量属性存储于flash中，并提供度量列上按位置访问API支持度量列的随机访问。Described step 1) in, adopt memory storage engine, dimension table resides in DRAM; The foreign key in the fact table is the row that access frequency is higher when star connection in the multidimensional analysis, resides in DRAM equally; Fact table's The metric attributes are stored in flash, and the access API supports random access of metric columns by position on the metric columns.

所述步骤3)中，维表过滤分组向量、选择向量和分组向量、分组多维数组存储于DRAM中；事实表度量属性存储于flash中。In the step 3), the dimension table filtering grouping vector, selection vector and grouping vector, grouping multidimensional array are stored in DRAM; the fact table measurement attributes are stored in flash.

所述步骤2)中，所述AIR算法的OLAP查询处理包括以下步骤：①将OLAP查询分解为维表上的分组过滤操作：将查询中的选择和分组操作按维表进行划分，将查询分解为在各个维表上的子查询；②生成维表过滤分组向量：每个维表根据各自的where子句对记录进行过滤并投影出满足选择条件的分组属性，满足选择条件的分组属性进行字典表压缩，字典表存储于数组中，字典压缩码为字典数组下标，在过滤分组属性中选择条件为false的位置置为-1，否则置为分组压缩编码，并记录在与维表等长的过滤分组向量中；③事实表外键多趟扫描创建选择和分组向量：按维表过滤分组向量选择率由低到高的顺序依次扫描事实表对应的外键列，每列扫描时按外键值映射到相应维表过滤分组向量指定的位置，向量值非负时将当前事实表记录位置插入选择向量；下一个外键列扫描时按照当前选择向量中的记录位置采用随机访问方式，并通过相应维表过滤分组向量更新选择向量，删除不满足后续外键映射条件的事实表记录位置；当各外键列扫描结束后，选择向量记录了事实表中满足全部连接条件的记录位置；当总选择率较高时，在更新选择向量的同时更新分组向量，用维表过滤分组向量中当前分组的下标增量地计算分组多维数组下标；当总选择率很低时，首先生成选择向量，最后按选择向量的位置随机访问各外键列，一次性生成分组向量；④通过选择向量访问flash存储度量列指定度量值：根据选择向量中的位置访问flash存储上的度量属性值；根据选择向量在flash上的随机访问采用多核并行执行；⑤通过分组向量将度量值在分组多维数组中聚集：按照分组向量所记录的分组下标，将从flash存储上返回的度量属性值“推”到分组向量中下标值指示的分组多维数组单元中进行聚集计算，完成全部的OLAP查询处理；⑥将分组多维数组的各维反向映射到分组字典表中获得分组属性值并输出查询结果。In the described step 2), the OLAP query processing of the AIR algorithm comprises the following steps: 1. the OLAP query is decomposed into grouping and filtering operations on the dimension table: the selection and grouping operations in the query are divided by the dimension table, and the query is decomposed into It is a subquery on each dimension table; ② Generate a dimension table filter grouping vector: each dimension table filters records according to their respective where clauses and projects the grouping attributes that meet the selection conditions, and the grouping attributes that meet the selection conditions are converted into a dictionary Table compression, the dictionary table is stored in the array, the dictionary compression code is the subscript of the dictionary array, in the filter group attribute, the position where the selection condition is false is set to -1, otherwise it is set to the group compression code, and recorded in the same length as the dimension table ③ Create selection and grouping vectors by multiple scans of foreign keys in the fact table: scan the foreign key columns corresponding to the fact table in order of selection rate from low to high according to the selection rate of dimension table filtering grouping vectors, and scan each column by foreign key The key value is mapped to the position specified by the filter grouping vector of the corresponding dimension table. When the vector value is non-negative, insert the record position of the current fact table into the selection vector; when scanning the next foreign key column, the random access method is adopted according to the record position in the current selection vector, and Update the selection vector by filtering the grouping vector of the corresponding dimension table, and delete the record positions of the fact table that do not meet the subsequent foreign key mapping conditions; When the total selection rate is high, update the grouping vector while updating the selection vector, and use the dimension table to filter the subscript of the current grouping in the grouping vector to incrementally calculate the grouping multidimensional array subscript; when the total selection rate is very low, first generate the selection vector, and finally randomly access each foreign key column according to the position of the selected vector, and generate a grouping vector at one time; ④Access the flash storage metric column to specify the metric value by selecting the vector: access the metric attribute value on the flash storage according to the position in the selected vector; The random access of the selection vector on the flash adopts multi-core parallel execution; ⑤Aggregate the measurement values in the multi-dimensional array through the grouping vector: according to the subscript of the grouping recorded by the grouping vector, the value of the measurement attribute returned from the flash storage is "pushed" Perform aggregation calculation in the grouping multidimensional array unit indicated by the subscript value in the grouping vector, and complete all OLAP query processing; ⑥ Reversely map each dimension of the grouping multidimensional array to the grouping dictionary table to obtain the grouping attribute value and output the query result.

所述步骤②中，所述过滤分组向量是一个动态生成的维表附加列，代替维表与事实表进行连接操作并提供连接记录在当前维上分组属性的编码；当维表上没有分组属性时，过滤分组向量简化为一个位图，用于事实表外键的连接过滤。In the step 2., the filter grouping vector is an additional column of a dynamically generated dimension table, which replaces the dimension table and the fact table to perform a connection operation and provides the encoding of the grouping attribute of the connection record on the current dimension; when there is no grouping attribute on the dimension table When , the filter grouping vector is simplified to a bitmap, which is used for the connection filtering of the foreign key of the fact table.

所述步骤4)中，在关键字位图连接索引的基础上采用DRAM-flash两级位图连接索引方法，根据内存存储空间配额精确地在K个关键字连接位图中选择最频繁使用的n个作为DRAM驻留位图连接索引，其余K-n个关键字位图作为二级索引存储在flash中。In described step 4), adopt DRAM-flash two-level bitmap connection index method on the basis of keyword bitmap connection index, select the most frequently used in K keyword connection bitmaps accurately according to memory storage space quota N are connected indexes as DRAM resident bitmaps, and the remaining K-n keyword bitmaps are stored in flash as secondary indexes.

本发明由于采取以上技术方案，其具有以下优点：1、本发明由于采用DRAM-flash两级存储作为内存OLAP的存储平台。相对于完全内存存储模型，DRAM-flash两级存储大幅度降低了对昂贵内存的需求，降低了硬件的整体成本，通过flash-aware存储模型及AIROLAP算法的优化，对flash上存储的大量度量属性采用高效率的随机访问，缩小了flash存储的性能差距。2、本发明采用的AIR OLAP查询处理算法将一个完整的OLAP查询处理过程划分为三个独立的处理阶段，在维表和事实表外键处理阶段只涉及较小比例的数据，每个维表生成一个定长的过滤分组向量；事实表上无论查询涉及的列数量多少都只需要一个选择向量和一个分组向量，选择和分组向量的初始长度由维上最低的选择率决定，在事实表外键的星形连接中长度不断缩短，内存存储开销有限；数据库中存储空间占比最大的度量属性存储在大容量flash中，传统的行式访问(包括行存储以及列存储上的行式访问)OLAP查询处理在事实表全表扫描过程中完成流水线式的哈希连接和哈希分组聚集操作，AIR OLAP查询算法将连接、分组操作和聚集操作分解，不使用传统数据库中常用的全表扫描而是在数量有限的事实表外键上完成连接分组操作，然后根据选择率极低的选择向量按位置访问指定度量列的指定位置，将flash存储上的数据访问操作推到最后处理阶段，极大减少了flash上的数据访问负载，同时也能够发挥出flash良好的随机访问性能。3、本发明由于采用的位图连接索引是一种增量索引，因事实表记录增加而产生的索引更新操作是位图的顺序增长，而不是象B+-树索引那样更新节点内容和结构，能够消除flash上数据更新的数据重写代价，因此更适合flash存储。DRAM-flash两级内存位图连接索引方法在基于关键字位图连接索引的基础上通过查询关键字访问频度以及内存存储空间对关键字位图采用两级存储模式，将K个关键字位图中n个高频使用的关键字位图存储于DRAM，其余K-n个关键字位图存储于flash，降低了内存索引的存储开销。4、本发明是通过OLAP模式和负载的特点以及OLAP查询处理算法中数据的局部性特征(即数据被频繁访问的程度)建立异构存储模型，以表、索引、表中不同数据局部性强度的列组以及查询依赖的中间数据结构为对象，根据内存容量及数据访问性能约束将其优化分布在高性能但容量相对较小的DRAM和大容量但性能相对较低的flash存储中，提高数据存储效率。5、本发明是在查询处理时根据DRAM和flash存储的特性，将flash上的数据访问推后执行，将OLAP完整的查询处理过程划分为内存处理和flash计算两个阶段，支持OLAP查询间不同存储访问阶段上的流水并行，提高内存和CPU使用效率。本发明可以广泛应用于通用OLAP应用场景中。Because the present invention adopts the above technical scheme, it has the following advantages: 1. The present invention adopts DRAM-flash two-level storage as the storage platform of memory OLAP. Compared with the full memory storage model, DRAM-flash two-level storage greatly reduces the demand for expensive memory and reduces the overall cost of hardware. Through the optimization of the flash-aware storage model and the AIROLAP algorithm, a large number of measurement attributes stored on the flash High-efficiency random access is used to narrow the performance gap of flash storage. 2. The AIR OLAP query processing algorithm adopted in the present invention divides a complete OLAP query processing process into three independent processing stages, and only involves a small proportion of data in the dimension table and fact table foreign key processing stage, each dimension table Generate a fixed-length filtering grouping vector; no matter how many columns are involved in the query on the fact table, only one selection vector and one grouping vector are needed. The initial length of the selection and grouping vectors is determined by the lowest selection rate on the dimension, and outside the fact table The length of the star connection of the key is continuously shortened, and the memory storage overhead is limited; the metric attribute with the largest storage space in the database is stored in the large-capacity flash, and the traditional row access (including row storage and row access on the column storage) OLAP query processing completes pipelined hash join and hash group aggregation operations during the full table scan of the fact table. The AIR OLAP query algorithm decomposes the join, group operation, and It is to complete the connection grouping operation on a limited number of foreign keys of the fact table, and then access the specified position of the specified measurement column according to the position according to the selection vector with a very low selectivity rate, and push the data access operation on the flash storage to the final processing stage, which is extremely large. The data access load on the flash is reduced, and the good random access performance of the flash can also be brought into play. 3. The bitmap connection index adopted by the present invention is a kind of incremental index, and the index update operation generated because of the fact table record increase is the sequential growth of the bitmap, instead of updating the node content and structure like the B+-tree index, It can eliminate the data rewriting cost of data update on flash, so it is more suitable for flash storage. The DRAM-flash two-level memory bitmap connection index method adopts a two-level storage mode for the keyword bitmap by querying the keyword access frequency and memory storage space based on the keyword bitmap connection index. In the figure, n frequently used keyword bitmaps are stored in DRAM, and the remaining K-n keyword bitmaps are stored in flash, which reduces the storage overhead of the memory index. 4. The present invention establishes a heterogeneous storage model through the characteristics of OLAP patterns and loads and the locality characteristics of data in the OLAP query processing algorithm (that is, the degree of frequent access to data), and uses different data locality strengths in tables, indexes, and tables. The column group and the intermediate data structure that the query depends on are used as objects. According to the memory capacity and data access performance constraints, they are optimized and distributed in high-performance but relatively small-capacity DRAM and large-capacity but relatively low-performance flash storage to improve data quality. storage efficiency. 5. According to the characteristics of DRAM and flash storage during query processing, the present invention delays the execution of data access on the flash, divides the complete query processing process of OLAP into two stages of memory processing and flash calculation, and supports different OLAP queries. Pipeline parallelism in the storage access stage improves memory and CPU usage efficiency. The present invention can be widely used in general OLAP application scenarios.

附图说明Description of drawings

图1是本发明的内存OLAP在DRAM-flash两级存储上的存储示意图；Fig. 1 is the storage schematic diagram of memory OLAP of the present invention on DRAM-flash two-level storage;

图2是本发明实施例中内存OLAP在DRAM-flash两级存储上的查询处理示意图；Fig. 2 is a schematic diagram of query processing of memory OLAP on DRAM-flash two-level storage in an embodiment of the present invention;

图3是本发明的DRAM-flash两级存储上的关键字位图连接索引处理示意图。FIG. 3 is a schematic diagram of key bitmap connection index processing on the DRAM-flash two-level storage of the present invention.

具体实施方式detailed description

现有的内存OLAP技术通常采用完全内存计算模式，或者将大容量flash用作高速缓存，前者增加了内存计算的成本，后者难以将OLAP优化从内存扩展到flash上。为此，本发明提出了一种基于DRAM-flash两级存储的内存-闪存混合存储模式下的OLAP查询优化方法，根据OLAP模式、负载以及OLAP算法的特点优化OLAP数据在内存和flash上的存储，面向两级存储特点优化OLAP查询算法。本发明适用于通用OLAP应用场景。下面结合附图和实施例对本发明进行详细的描述。Existing in-memory OLAP technologies usually adopt a full-memory computing mode, or use large-capacity flash as a cache. The former increases the cost of in-memory computing, and the latter makes it difficult to extend OLAP optimization from memory to flash. For this reason, the present invention proposes an OLAP query optimization method based on DRAM-flash two-level storage memory-flash memory hybrid storage mode, and optimizes the storage of OLAP data on memory and flash according to the characteristics of OLAP mode, load and OLAP algorithm , to optimize the OLAP query algorithm for the characteristics of two-level storage. The present invention is applicable to general OLAP application scenarios. The present invention will be described in detail below in conjunction with the accompanying drawings and embodiments.

如图1所示，本发明提供一种内存-闪存混合存储模式下的OLAP查询优化方法，该方法基于DRAM-flash两级存储，面向采用DRAM和大容量PCIe Flash Card构成的DRAM-flash两级存储上的内存OLAP查询优化方法，其包括以下步骤：As shown in Figure 1, the present invention provides an OLAP query optimization method under a memory-flash hybrid storage mode, which is based on DRAM-flash two-level storage, and is oriented to DRAM-flash two-level storage composed of DRAM and large-capacity PCIe Flash Card A memory OLAP query optimization method on storage, which includes the following steps:

1)OLAP存储采用flash-aware的存储模型，即根据OLAP星形或雪花形模型中维表较小、谓词操作较多的特点和事实表由外键和度量属性组成的特点，在相对较小的DRAM和相对较大flash存储之间按数据访问的局部性进行划分，在异构的两级内存上进行存储优化，以提高内存中高频使用数据的效率。1) OLAP storage adopts the flash-aware storage model, that is, according to the characteristics of small dimension tables and many predicate operations in OLAP star or snowflake models and the characteristics of fact tables composed of foreign keys and measurement attributes, relatively small The DRAM and relatively large flash storage are divided according to the locality of data access, and storage optimization is performed on the heterogeneous two-level memory to improve the efficiency of high-frequency use of data in the memory.

由于维表较小且现代OLAP支持维表上的更新，因此采用内存存储引擎，维表驻留DRAM。事实表中的外键是多维分析中星形连接时访问频度较高的列，同样驻留在DRAM中。事实表的度量属性较多，OLAP查询通常只针对少数度量属性和极低选择率上的度量值，因此存储于廉价大容量的flash中，并提供度量列上按位置访问API支持度量列的按位置随机访问。Since dimension tables are small and modern OLAP supports updates on dimension tables, an in-memory storage engine is used, and dimension tables reside in DRAM. The foreign key in the fact table is a column with high access frequency in the star connection in multidimensional analysis, and it also resides in DRAM. Fact tables have many metric attributes, and OLAP queries are usually only for a few metric attributes and metric values with a very low selection rate. Therefore, they are stored in cheap and large-capacity flash, and an API is provided to support metric column access by location on the metric column. Location random access.

2)内存OLAP采用数组存储，每个属性列存储于连续的数组单元中，表由相同长度的各属性数组组成，其中较优地，维表使用数组下标作为主键，事实表外键为维表中相应记录的数据下标，事实表记录能够根据外键值直接定位维表中对应的数组单元，将传统的连接操作简化为数组下标访问(Arra_y Index Reference，AIR)，进行AIR算法的OLAP查询处理。2) In-memory OLAP adopts array storage, and each attribute column is stored in a continuous array unit. The table is composed of attribute arrays of the same length. Preferably, the dimension table uses the array subscript as the primary key, and the foreign key of the fact table is dimension The data subscript of the corresponding record in the table, and the fact table record can directly locate the corresponding array unit in the dimension table according to the foreign key value, simplifying the traditional connection operation to the array subscript access ( _Array Index Reference, AIR), and performing the AIR algorithm OLAP query processing.

3)将基于数组存储和AIR算法的OLAP查询处理分解为三个顺序的数据访问过程：维表访问、事实表外键访问和事实表度量属性访问，三个阶段产生的中间数据结构包括：维表过滤分组向量、选择向量和分组向量、分组多维数组。维表过滤分组向量、选择向量和分组向量为各个查询共用的数据结构，不同的查询只需要更新向量的内容；分组多维数组根据查询的不同而动态生成。这三类中间数据结构在查询处理过程中被重复使用，属于强局部性数据集，需要存储于DRAM中；事实表度量属性所占存储空间较大，但查询中通常只访问较少的度量属性，而且通过选择向量只需要随机访问度量属性列中极低比例的记录，因此度量属性列可以存储于flash中以减少内存OLAP对DRAM的需求，降低系统的硬件成本。3) Decompose the OLAP query processing based on array storage and AIR algorithm into three sequential data access processes: dimension table access, fact table foreign key access, and fact table measurement attribute access. The intermediate data structures generated in the three stages include: dimension table access Table filtering grouping vectors, selecting vectors and grouping vectors, grouping multidimensional arrays. Dimension table filtering grouping vector, selection vector and grouping vector are data structures shared by all queries. Different queries only need to update the content of the vector; grouping multidimensional arrays are dynamically generated according to different queries. These three types of intermediate data structures are reused in the query processing process and belong to strong locality data sets, which need to be stored in DRAM; the fact table measurement attributes occupy a large storage space, but usually only a small number of measurement attributes are accessed in the query , and only a very low proportion of records in the metric attribute column are randomly accessed by selecting the vector, so the metric attribute column can be stored in flash to reduce the demand for DRAM in memory OLAP and reduce the hardware cost of the system.

本发明所采用的AIR OLAP算法需要选择向量、分组向量、维表过滤分组向量、分组多维数组等数据用于OLAP查询处理过程，这些数据结构在OLAP查询间可以重复使用，所占内存空间固定，因此常驻于DRAM。The AIR OLAP algorithm adopted in the present invention needs to select data such as vectors, grouping vectors, dimension table filtering grouping vectors, and grouping multidimensional arrays for the OLAP query processing process. These data structures can be reused between OLAP queries, and the occupied memory space is fixed. Therefore, it is resident in DRAM.

4)数据仓库通常使用位图连接索引来优化或消除维表和事实表之间的连接代价。本发明采用在基于关键字的位图连接索引(即按关键字而不是整个属性成员来建立位图连接索引)的基础上在DRAM-flash两级存储中优化存储K个关键字连接位图(即每人关键字对应一个与事实表等长的位图，记录该关键字所对应的事实表记录位置)，从中选择n个高频访问关系字并将相应的位图存储于DRAM，其余的位图存储于flash，形成二级连接位图索引结构；4) Data warehouses usually use bitmap join indexes to optimize or eliminate join costs between dimension tables and fact tables. The present invention adopts on the basis of keyword-based bitmap connection index (that is, set up bitmap connection index by keyword instead of whole attribute members) in DRAM-flash two-level storage to optimize and store K keyword connection bitmaps ( That is, each keyword corresponds to a bitmap with the same length as the fact table, record the fact table record position corresponding to the keyword), select n high-frequency access relation words and store the corresponding bitmap in DRAM, and the rest The bitmap is stored in flash to form a secondary link bitmap index structure;

其中，本发明在关键字位图连接索引的基础上采用DRAM-flash两级位图连接索引方法，根据内存存储空间配额精确地在K个关键字连接位图中选择最频繁使用的n个作为DRAM驻留位图连接索引，其余K-n个关键字位图作为二级索引。具体为：索引项为某个维表属性值所对应的事实表位置位图，位图索引的大小为定长的位图大小与关键字数量的乘积，位图索引可以通过数据压缩进一步缩减内存存储空间。通过查询执行日志和查询关键字分析，可以确定K个最频繁使用的维属性关键字并为其建立位图索引，根据内存空间配额可以将其中n个最频繁使用的关键字位图驻留于内存，其余K-n个关键字位图存储于flash，形成二级连接位图索引结构。Among them, the present invention adopts the DRAM-flash two-level bitmap connection index method on the basis of the keyword bitmap connection index, and accurately selects the most frequently used n among the K keyword connection bitmaps according to the memory storage space quota as The DRAM resides in the bitmap connection index, and the remaining K-n keyword bitmaps are used as secondary indexes. Specifically: the index item is the bitmap of the fact table location corresponding to the attribute value of a certain dimension table. The size of the bitmap index is the product of the fixed-length bitmap size and the number of keywords. The bitmap index can further reduce the memory through data compression. storage. By querying execution logs and query keyword analysis, the K most frequently used dimension attribute keywords can be determined and a bitmap index can be established for them. According to the memory space quota, the n most frequently used keyword bitmaps can reside in memory, and the remaining K-n keyword bitmaps are stored in flash to form a secondary connection bitmap index structure.

上述步骤2)中，AIR算法的OLAP查询处理包括以下步骤：In the above step 2), the OLAP query processing of the AIR algorithm comprises the following steps:

①将OLAP查询分解为维表上的分组过滤操作：将查询中的选择和分组操作按维表进行划分，将查询分解为在各个维表上的子查询。① Decompose the OLAP query into grouping and filtering operations on dimension tables: divide the selection and grouping operations in the query into dimension tables, and decompose the query into subqueries on each dimension table.

②生成维表过滤分组向量：每个维表根据各自的where子句对记录进行过滤并投影出满足选择条件的分组属性，满足选择条件的分组属性进行字典表压缩，字典表存储于数组中，字典压缩码为字典数组下标，在过滤分组属性中选择条件为false的位置置为-1，否则置为分组压缩编码，并记录在与维表等长的过滤分组向量中。②Generate dimension table filter grouping vector: each dimension table filters records according to their respective where clauses and projects the grouping attributes that meet the selection conditions. The grouping attributes that meet the selection conditions are compressed in the dictionary table, and the dictionary table is stored in the array. The dictionary compression code is the subscript of the dictionary array. In the filter group attribute, the position where the selection condition is false is set to -1, otherwise it is set to the group compression code and recorded in the filter group vector with the same length as the dimension table.

过滤分组向量是一个动态生成的维表附加列，代替维表与事实表进行连接操作并提供连接记录在当前维上分组属性的编码。当维表上没有分组属性时，过滤分组向量简化为一个位图，用于事实表外键的连接过滤。The filter grouping vector is a dynamically generated additional column of the dimension table, which replaces the dimension table and the fact table for the connection operation and provides the encoding of the grouping attribute of the connection record on the current dimension. When there is no grouping attribute on the dimension table, the filter grouping vector is simplified to a bitmap, which is used for connection filtering of the foreign key of the fact table.

③事实表外键多趟扫描创建选择和分组向量：按维表过滤分组向量选择率由低到高的顺序依次扫描事实表对应的外键列，每列扫描时按外键值映射到相应维表过滤分组向量指定的位置，向量值非负时将当前事实表记录位置插入选择向量；下一个外键列扫描时按照当前选择向量中的记录位置采用随机访问方式，并通过相应维表过滤分组向量更新选择向量，删除不满足后续外键映射条件的事实表记录位置；当各外键列扫描结束后，选择向量记录了事实表中满足全部连接条件的记录位置。当总选择率较高时，在更新选择向量的同时更新分组向量，用维表过滤分组向量中当前分组的下标增量地计算分组多维数组下标；当总选择率很低时，首先生成选择向量，最后按选择向量的位置随机访问各外键列，一次性生成分组向量。③Multiple scans of foreign keys in fact table to create selection and grouping vectors: filter grouping vectors according to dimension tables, scan the foreign key columns corresponding to fact tables in order of selection rate from low to high, and map each column to the corresponding dimension according to the foreign key value when scanning The position specified by the table filtering grouping vector, when the vector value is non-negative, insert the current fact table record position into the selection vector; when scanning the next foreign key column, use random access method according to the record position in the current selection vector, and filter the grouping through the corresponding dimension table The vector updates the selection vector, and deletes the record positions of the fact table that do not meet the subsequent foreign key mapping conditions; after the scanning of each foreign key column is completed, the selection vector records the record positions in the fact table that meet all the connection conditions. When the total selection rate is high, update the grouping vector while updating the selection vector, and use the dimension table to filter the subscript of the current grouping in the grouping vector to incrementally calculate the grouping multidimensional array subscript; when the total selection rate is low, first generate Select the vector, and finally randomly access each foreign key column according to the position of the selected vector, and generate a grouping vector at one time.

④通过选择向量访问flash存储度量列指定度量值：根据选择向量中的位置访问flash存储上的度量属性值，只需要返回极少的度量值进行分组聚集计算；flash具有良好的并行随机访问性能，根据选择向量在flash上的随机访问采用多核并行执行，减少flash数据访问延迟。④ Specify the metric value by selecting the vector to access the metric column of the flash storage: access the metric attribute value on the flash storage according to the position in the selection vector, and only need to return a very small number of metric values for grouping and aggregation calculation; flash has good parallel random access performance, The random access on the flash according to the selection vector adopts multi-core parallel execution to reduce the flash data access delay.

⑤通过分组向量将度量值在分组多维数组中聚集：按照分组向量所记录的分组下标，将从flash存储上返回的度量属性值“推”到分组向量中下标值指示的分组多维数组单元中进行聚集计算，完成全部的OLAP查询处理。⑤Aggregate the metric values in the grouped multidimensional array through the grouping vector: according to the grouping subscript recorded in the grouping vector, "push" the metric attribute value returned from the flash storage to the grouping multidimensional array unit indicated by the subscript value in the grouping vector Aggregation calculations are performed in the database to complete all OLAP query processing.

⑥将分组多维数组的各维反向映射到分组字典表中获得分组属性值并输出查询结果。⑥ Reversely map each dimension of the grouped multidimensional array to the grouping dictionary table to obtain the grouping attribute value and output the query result.

实施例：Example:

如图2所示，flash-aware体现在通过对OLAP查询算法的优化将对flash上存储的度量属性的访问推到最后，将事实表顺序扫描转换为根据选择向量对事实表度量属性执行低选择率的随机访问，降低flash上度量属性访问的延迟。As shown in Figure 2, flash-aware is reflected in the optimization of the OLAP query algorithm to push the access to the metric attributes stored on the flash to the end, and convert the sequential scan of the fact table to perform low selection on the metric attributes of the fact table according to the selection vector High-rate random access, reducing the latency of metric attribute access on flash.

步骤2)以下面的查询命令为例：Step 2) Take the following query command as an example:

SELECT c_nation,s_nation,sum(l_revenue),sum(l_price)SELECT c_nation, s_nation, sum(l_revenue), sum(l_price)

FROM customer,lineorder,supplierFROM customer, line order, supplier

WHERE lo_custkey＝c_custkeyWHERE lo_custkey=c_custkey

and lo_suppkey＝s_suppkeyand lo_suppkey=s_suppkey

and c_region＝'AMERICA'and c_region='AMERICA'

and s_region＝'ASIA'and s_region='ASIA'

group by c_nation,s_nation；group by c_nation, s_nation;

查询命令需要将事实表l ineorder和维表customer，supplier连接，然后按维表的c_region和s_region属性对事实表度量列l_revenue和l_price求累加和。The query command needs to connect the fact table lineorder with the dimension tables customer and supplier, and then calculate the cumulative sum of the fact table measurement columns l_revenue and l_price according to the c_region and s_region attributes of the dimension table.

如图3所示，本发明步骤4)中DRAM-flash两级存储上的关键字位图连接索引处理实施例：As shown in Figure 3, the keyword bitmap connection index processing embodiment on the DRAM-flash two-level storage in step 4) of the present invention:

传统的数据库中，索引是一种平面结构，假设在相同的存储层次上使用。B+-树索引等磁盘数据库的索引技术通过缓冲区机制实现内存缺失节点在磁盘和内存的数据交换，但仍然是一种不透明的机制，其索引访问效率取决于缓冲区替换算法的效率，不能被数据库定制式优化。在OLAP应用中，索引项存在于维表属性，但索引的对象则是事实表中对应的连接记录，普通的B+-树索引只能检索单表上的数据，对OLAP应用而言，较小的维表上的索引访问性能的提高对OLAP查询整体性能的加速作用有限，而对事实表外键建立索引能够加速事实表与维表之间的连接操作，但事实表外键上的索引存储开销极为庞大，而且现代OLAP应用中事实表需要实时大量更新，索引的更新代价巨大。本发明采用的是位图连接索引方法，即通过连接操作为维表上指定的属性成员建立事实表连接位图，每个成员建立一个位图，指示该成员在事实表记录上的连接位置。位图连接索引是一种增量索引，在事实表大量数据插入时只需要增量地扩展位图长度即可，不需要对已建立的连接位图进行重构。传统数据库中使用的位图连接索引是以属性为粒度为所有成员建立连接位图，面临着低势集属性位图连接索引空间开销小但选择率过高，而高势集属性选择率低但连接位图数量多，存储开销过大的矛盾。本发明采用按查询负载中的维属性关键字的TOP K访问频度选择位图连接索引关键字，并为K个全局高频访问关键字建立全局位图连接索引，形成一个key/value索引结构，key为关键字的全局名称，包括表及关键字信息，value则是连接位图。不同维表、不同属性的关键字具有相同的连接位图长度，可以存储在全局位图连接索引中。其中较优地，本发明在关键字位图连接索引的基础上提出了DRAM-flash两级位图连接索引方法，根据内存存储空间配额精确地在K个关键字连接位图中选择最频繁使用的n个作为DRAM驻留位图连接索引，其余K-n个关键字位图作为二级索引。如图3所示，显示了查询关键字位图分别存在于DRAM和flash两级存储的应用场景，首先DRAM中的位图完成选择操作，生成过滤位图，当过滤位图的选择率在某个阈值范围(S_low,S_high)之间时，按照生成的过滤位图中“1”的位置向flash存储中相应位图的指定位置进行随机访问，确定该位置最终的逻辑结果。Flash具有较好的随机访问能力，本发明采用对过滤位图顺序访问，对flash位图并行访问的策略，提高flash上数据的并行访问性能，减少位图索引计算的延迟，如图3中所示的并行flash访问线程。假设DRAM中过滤位图的选择率为S₁，flash中位图的选择率为S₂(位图的选择率可以由“1”的数量精确地给出)，T_F(S₁)为在flash上的位图访问延迟，T_P(S₂,S₁)为在索引选择率为S₂和S₁时查询处理的时间差，当T_P(S₂,S₁)-T_F(S₁)>0时，flash上的位图访问具有查询性能收益。S_low和S_high为满足查询收益的最低和最高选择率差值(如S₂-S₁)。In a traditional database, an index is a flat structure that is assumed to be used on the same storage level. The index technology of disk database such as B+-tree index realizes the data exchange between memory missing nodes on disk and memory through the buffer mechanism, but it is still an opaque mechanism, and its index access efficiency depends on the efficiency of the buffer replacement algorithm, which cannot be Database customization optimization. In OLAP applications, index items exist in dimension table attributes, but the index objects are the corresponding connection records in the fact table. Ordinary B+-tree indexes can only retrieve data on a single table. For OLAP applications, it is relatively small. The improvement of the index access performance on the dimension table has a limited effect on the acceleration of the overall performance of OLAP queries, and the establishment of an index on the foreign key of the fact table can speed up the connection operation between the fact table and the dimension table, but the index storage on the foreign key of the fact table The overhead is extremely large, and the fact table in modern OLAP applications needs to be updated in real time, and the update cost of the index is huge. The present invention adopts a bitmap connection index method, that is, establishes a fact table connection bitmap for specified attribute members on the dimension table through connection operations, and establishes a bitmap for each member to indicate the connection position of the member on the fact table record. The bitmap join index is a kind of incremental index. When a large amount of data is inserted into the fact table, it only needs to expand the length of the bitmap incrementally, and there is no need to reconstruct the established join bitmap. The bitmap join index used in the traditional database is to establish a join bitmap for all members at the granularity of attributes. Faced with low potential set attribute bitmap join index, the space cost is small but the selection rate is too high, while the high potential set attribute has a low selectivity rate but The number of connection bitmaps is large, and the storage overhead is too large. The present invention selects the bitmap connection index keywords according to the TOP K access frequency of the dimension attribute keywords in the query load, and establishes a global bitmap connection index for K global high-frequency access keywords to form a key/value index structure , key is the global name of the keyword, including table and keyword information, and value is the connection bitmap. Keywords of different dimension tables and different attributes have the same connection bitmap length, which can be stored in the global bitmap connection index. Wherein preferably, the present invention proposes a DRAM-flash two-level bitmap connection index method on the basis of the keyword bitmap connection index, and accurately selects the most frequently used keywords in the K keyword connection bitmaps according to the memory storage space quota The n of Kn are used as DRAM resident bitmap connection indexes, and the remaining Kn keyword bitmaps are used as secondary indexes. As shown in Figure 3, it shows the application scenario where the query keyword bitmap exists in DRAM and flash respectively. First, the bitmap in DRAM completes the selection operation to generate a filter bitmap. When the selection rate of the filter bitmap is at a certain When between the threshold ranges (S _low , S _high ), perform random access to the specified position of the corresponding bitmap in the flash storage according to the position of "1" in the generated filter bitmap, and determine the final logical result of the position. Flash has better random access ability, and the present invention adopts to filtering bitmap sequential access, to the strategy of flash bitmap parallel access, improves the parallel access performance of data on the flash, reduces the delay of bitmap index calculation, as shown in Figure 3 Parallel flash access threads shown. Assuming that the selectivity rate of bitmap filtering in DRAM is S ₁ , and the selectivity rate of bitmap in flash is S ₂ (the selectivity rate of bitmap can be precisely given by the number of "1"), T _F (S ₁ ) is in Bitmap access delay on flash, T _P (S ₂ ,S ₁ ) is the query processing time difference between index selectivity S ₂ and S ₁ , when T _P (S ₂ ,S ₁ )-T _F (S ₁ )>0, bitmap access on flash has query performance benefits. S _low and S _high are the difference between the lowest and highest selectivity ratios (such as S ₂ -S ₁ ) that satisfy the query benefit.

综上所述，与现有技术相比，本发明采用基于DRAM-flash两级存储的内存OLAP查询优化方法，支持DRAM-flash两级存储上的内存OLAP查询处理，降低内存OLAP对昂贵内存的需求，提高内存OLAP的性价比。通过基于DRAM-flash两级存储的存储优化方法、OLAP查询优化方法和索引优化方法，透明地优化在OLAP查询处理过程中的数据访问和查询处理性能。本发明同时提高了内存OLAP大数据的访问数据管理能力和查询处理性能。In summary, compared with the prior art, the present invention adopts a memory OLAP query optimization method based on DRAM-flash two-level storage, supports memory OLAP query processing on DRAM-flash two-level storage, and reduces the impact of memory OLAP on expensive memory. demand and improve the cost performance of in-memory OLAP. Through the storage optimization method based on DRAM-flash two-level storage, OLAP query optimization method and index optimization method, the data access and query processing performance in the OLAP query processing process are transparently optimized. The invention simultaneously improves the access data management capability and query processing performance of the memory OLAP big data.

上述各实施例仅用于说明本发明，其中各部件的结构、连接方式和制作工艺等都是可以有所变化的，凡是在本发明技术方案的基础上进行的等同变换和改进，均不应排除在本发明的保护范围之外。The above-mentioned embodiments are only used to illustrate the present invention, wherein the structure, connection mode and manufacturing process of each component can be changed to some extent, and any equivalent transformation and improvement carried out on the basis of the technical solution of the present invention should not excluded from the protection scope of the present invention.

Claims

1. a kind of internal memory-flash memory mixes the OLAP query optimization method under memory module, and it is comprised the following steps：

1) OLAP storage using flash-aware storage model, i.e., according to dimension table in OLAP stars or snowflake shape model it is smaller, The characteristics of the characteristics of predicate operation is more and true table are made up of external key and metric attribute, in relatively small DRAM and relatively Divided by the locality of data access between big flash storages, storage optimization is carried out on the two-stage internal memory of isomery；

2) memory OLAP uses storage of array, and each attribute column is stored in continuous array location, and table is by each of equal length Attribute array is constituted, and dimension table uses array index as major key, and true off-balancesheet key is the data subscript of respective record in dimension table, thing The record of real token can directly position corresponding array location in dimension table according to foreign key value, and traditional attended operation is reduced into array Subscript is accessed, and carries out the OLAP query treatment of AIR algorithms；Wherein AIR algorithms are array index access；

3) the OLAP query treatment based on storage of array and AIR algorithms is decomposed into three data access processes of order：Dimension table Access, true off-balancesheet key is accessed and true metric table attribute access, the intermediate data structure that three phases are produced includes：Dimension table mistake Filter packet vector, selection vector sum packet vector, packet Multidimensional numerical；Dimension table filter packets vector, selection vector sum be grouped to It is the shared data structure of each inquiry to measure, and different inquiries only needs to the content of renewal vector, is grouped Multidimensional numerical according to looking into Inquiry different and dynamic is generated；

4) using optimization storage K in DRAM-flash two-level memories on the basis of the bit join index based on keyword Keyword connects bitmap, therefrom selects n high frequency access relation word and corresponding bitmap is stored in into DRAM, and remaining bitmap is deposited Flash is stored in, two grades of connection bitmap index structures are formed.

2. a kind of internal memory-flash memory as claimed in claim 1 mixes the OLAP query optimization method under memory module, and its feature exists In：The step 1) in, using memory storage engine, dimension table is resident DRAM；External key in true table is star in multidimensional analysis Visiting frequency row higher during connection, in equally residing in DRAM；The metric attribute of true table is stored in flash, and is provided Opsition dependent accesses the random access that API supports measure column on measure column.

3. a kind of internal memory-flash memory as claimed in claim 1 mixes the OLAP query optimization method under memory module, and its feature exists In：The step 3) in, dimension table filter packets vector, selection vector sum packet vector, packet Multidimensional numerical are stored in DRAM； True metric table attribute is stored in flash.

4. a kind of internal memory-flash memory as claimed in claim 2 mixes the OLAP query optimization method under memory module, and its feature exists In：The step 3) in, dimension table filter packets vector, selection vector sum packet vector, packet Multidimensional numerical are stored in DRAM； True metric table attribute is stored in flash.

5. a kind of internal memory as claimed in claim 1 or 2 or 3 or 4-flash memory mixes the OLAP query optimization side under memory module Method, it is characterised in that：The step 2) in, the OLAP query treatment of the AIR algorithms is comprised the following steps：

1. OLAP query is decomposed into the packet filtering operation on dimension table：Selection in inquiry and division operation are carried out by dimension table Divide, be the subquery on each dimension table by query decomposition；

2. generation dimension table filter packets are vectorial：Each dimension table is filtered and is projected out according to respective where clause to record Meet the packet attributes of alternative condition, the packet attributes for meeting alternative condition carry out dictionary table compression, and dictionary table is stored in array In, dictionary compression code is dictionary array index, and alternative condition is set to -1 for the position of false in filter packets attribute, otherwise Packed compressed coding is set to, and is recorded in the filter packets vector isometric with dimension table；

3. true many times scanning of off-balancesheet key creates selection and packet vector：By dimension table filter packets vector selection rate from low to high Order scans the corresponding foreign key column of true table successively, is mapped to corresponding dimension table filter packets vector by foreign key value during per column scan and refers to Fixed position, will the current fact table record position insertion selection vector during vector value non-negative；During next external key column scan according to Record position in current selection vector uses random access manner, and by corresponding dimension table filter packets vector update selection to Amount, deletes the fact that be unsatisfactory for follow-up external key mapping condition table record position；After each foreign key column end of scan, selection vector note The record position of the whole condition of contacts of satisfaction in true table is recorded；When total selection rate is higher, while selection vector is updated Packet vector is updated, packet Multidimensional numerical subscript is incrementally calculated with the subscript of current group in dimension table filter packets vector；When When always selection rate is very low, selection vector is firstly generated, it is disposably raw finally by each foreign key column of position random access of selection vector Into packet vector；

4. measure column specified metric value is stored by selecting vector to access flash：Position in selection vector accesses flash Metric attribute value in storage；Performed using multi-core parallel concurrent according to random access of the selection vector on flash；

5. metric is assembled in Multidimensional numerical is grouped by being grouped vector：According to the packet subscript that packet vector is recorded, To be stored from flash return metric attribute value " pushing away " to packet vector in subscript value indicate packet Multidimensional numerical unit in Aggregation computation is carried out, whole OLAP query treatment are completed；

6. each dimension back mapping of Multidimensional numerical will be grouped to acquisition packet attributes value in packet dictionary table and Query Result will be exported.

6. a kind of internal memory-flash memory as claimed in claim 5 mixes the OLAP query optimization method under memory module, and its feature exists In：The step 2. in, the filter packets vector be one dynamic generation dimension table additional column, instead of dimension table with the fact table enter Row attended operation simultaneously provides the coding of linkage record packet attributes on leading dimension；When there is no packet attributes on dimension table, filtering Packet vector is reduced to a bitmap, for the connection filtering of true off-balancesheet key.

7. the OLAP query optimization under a kind of internal memory as described in claim 1 or 2 or 3 or 4 or 6-flash memory mixing memory module Method, it is characterised in that：The step 4) in, the levels of DRAM-flash two are used on the basis of keyword bit join index Figure join index method, accurately selects most frequently used according to memory storage space quota in K keyword connection bitmap N be resident bit join index as DRAM, remaining K-n keyword bitmap is as secondary index storage in flash.

8. a kind of internal memory-flash memory as claimed in claim 5 mixes the OLAP query optimization method under memory module, and its feature exists In：The step 4) in, DRAM-flash two-stages bit join index side is used on the basis of keyword bit join index Method, accurately most frequently used n is selected as DRAM according to memory storage space quota in K keyword connection bitmap Resident bit join index, remaining K-n keyword bitmap is stored in flash as secondary index.