CN103309958B

CN103309958B - The star-like Connection inquiring optimization method of OLAP under GPU and CPU mixed architecture

Info

Publication number: CN103309958B
Application number: CN201310204514.4A
Authority: CN
Inventors: 张延松; 张宇
Original assignee: Renmin University of China
Current assignee: Renmin University of China
Priority date: 2013-05-28
Filing date: 2013-05-28
Publication date: 2016-06-29
Anticipated expiration: 2033-05-28
Also published as: CN103309958A

Abstract

The invention discloses the star-like Connection inquiring optimization method of the OLAP under a kind of GPU and CPU mixed architecture, comprise the steps: that first passing through bit join index filters the optimization star-like attended operation of OLAP, the connection bitmap that buffer memory high frequency accesses in GPU buffer memory；Secondly, off-balancesheet key set of properties is loaded in GPU buffer memory and carries out star-like connection filters the fact that by satisfied connection bitmap filter condition；Finally, by the GPU filtered bitmap generated, the full table scan of big for internal memory true table is converted to opsition dependent random access, thus improving the query processing performance of the star-like connection of OLAP.The present invention improves the storage efficiency of GPU buffer memory and the parallel processing efficiency of GPU, improves the OLAP query process performance of mixed processing applicator platform on the whole.

Description

OLAP star join query optimization method under GPU and CPU mixed architecture

技术领域technical field

本发明涉及一种数据仓库查询处理方法，尤其涉及一种在GPU和CPU混合架构下，将通用GPU作为连接位图存储和处理引擎，从而优化复杂多维查询处理的方法，属于数据库管理技术领域。The invention relates to a data warehouse query processing method, in particular to a method for optimizing complex multi-dimensional query processing by using a general-purpose GPU as a connection bitmap storage and processing engine under the mixed architecture of GPU and CPU, and belongs to the technical field of database management.

背景技术Background technique

当前，微处理器技术主要分为两个发展趋势：一是多核通用处理器技术，另一种类型是众核协处理器技术。多核通用处理器以Intel的多核处理器技术为代表，主要特征是数量较少的处理核心和多级缓存（cache）。众核协处理器主要以NVIDIA公司的通用GPU（GeneralPurposeGraphicsProcessingUnit，简写为GPGPU）和Intel公司的至强融核^TM协处理器为代表。从当前和未来的发展趋势来看，众核协处理器已经成为高性能计算的基础平台。At present, microprocessor technology is mainly divided into two development trends: one is multi-core general-purpose processor technology, and the other type is many-core coprocessor technology. The multi-core general-purpose processor is represented by Intel's multi-core processor technology, and its main features are a small number of processing cores and multi-level cache (cache). Many-core coprocessors are mainly represented by NVIDIA's general-purpose GPU (General Purpose Graphics Processing Unit, abbreviated as GPGPU) and Intel's Xeon Phi ^TM coprocessor. Judging from current and future development trends, many-core coprocessors have become the basic platform for high-performance computing.

相对于多核CPU，通用GPU的并行计算能力强但数据管理能力较弱，不适合管理复杂数据类型和复杂的内存数据结构，不适合处理复杂的控制语句，更适合于标准的向量、矩阵、数组等计算。另外，协处理器的缓存容量相对内存较小，通常采用PCI-E卡作为服务器扩展的处理设备，需要在CPU的控制下从内存中获取数据，而内存与协处理器之间的数据传输速度大大低于内存与处理器之间的传输速度，因此协处理器上的数据处理时间需要包括内存向协处理器缓存的数据传输时间和数据在协处理器上的处理时间两部分。Compared with multi-core CPUs, general-purpose GPUs have strong parallel computing capabilities but weaker data management capabilities. They are not suitable for managing complex data types and complex memory data structures, and are not suitable for processing complex control statements. They are more suitable for standard vectors, matrices, and arrays. Wait for the calculation. In addition, the cache capacity of the coprocessor is smaller than that of the memory. PCI-E cards are usually used as the processing device for server expansion. Data needs to be obtained from the memory under the control of the CPU, and the data transmission speed between the memory and the coprocessor It is much lower than the transmission speed between the memory and the processor, so the data processing time on the coprocessor needs to include two parts: the data transmission time from the memory to the coprocessor cache and the data processing time on the coprocessor.

在数据仓库应用中，数据通常存储为复杂的星型模型或雪花状模型，一般由一个事实表和多个维表构成。维表存储复杂的数据类型，并且需要处理复杂的谓词表达式；事实表由维表外键属性和度量属性组成，需要在与维表连接的基础上对度量属性按维表中的分组属性进行分组聚集计算。维表数量较多，数据量较少（通常低于总数据量的5％），但数据类型和数据处理复杂；事实表数量庞大，其复杂的多表连接操作是分析处理的性能瓶颈。同时，分析型查询的并行处理性能受分组操作复杂度和聚集计算类型所影响，例如sum、count、average等聚集计算适合于并行处理，而median、percentile等聚集难以并行处理，因此复杂的分组聚集计算任务并不适合于通用GPU处理。In data warehouse applications, data is usually stored as a complex star schema or snowflake schema, generally consisting of a fact table and multiple dimension tables. Dimension tables store complex data types and need to process complex predicate expressions; fact tables are composed of dimension table foreign key attributes and measurement attributes, and measurement attributes need to be grouped according to the grouping attributes in the dimension table on the basis of connection with the dimension table Grouped aggregation calculations. The number of dimension tables is large and the amount of data is small (usually less than 5% of the total data volume), but the data types and data processing are complex; the number of fact tables is huge, and its complex multi-table join operation is the performance bottleneck of analysis and processing. At the same time, the parallel processing performance of analytical queries is affected by the complexity of grouping operations and the type of aggregation calculations. For example, aggregation calculations such as sum, count, and average are suitable for parallel processing, while aggregations such as median and percentile are difficult to process in parallel. Therefore, complex grouping and aggregation Computational tasks are not well suited for general-purpose GPU processing.

另一方面，何丙胜等人在论文《Relationalquerycoprocessingongraphicsprocessors》（刊载于《ACMTransactionsonDatabaseSystems》34卷第4期，2009年12月）中提出了Co-Processing计算模型，将查询任务负载在GPUworker和CPUworker之间协同分布，通过CPU与GPU的代价模型评估关系操作符在两个处理器节点上如何进行分配。该论文提出简单计算如选择操作适合CPU处理，复杂操作如连接操作适合GPU处理。GPU不适合做分支预测操作，因此数据库中典型的流水线（pipeline）处理技术难以在GPU中实现。连接操作会产生大量的物化数据，在数据仓库的OLAP（联机分析处理）应用中，星型连接会产生复杂的多表连接操作，通用GPU上的并行连接操作需要在每个连接操作符之间进行数据分区操作，数据预处理的代价较大。H.Pirk等人在论文《Acceleratingforeign-keyjoinsusingasymmetricmemorychannels》（发表于ProceedingsofInternationalConferenceonVeryLargeDataBases2011（VLDB），585-597）中，将GPU和CPU看作是一个分布式数据库，鉴于GPU节点的内部数据访问性能很高，但GPU与CPU之间的数据通道性能较低，提出了将数据仓库中较小的维表存储于通用GPU中，事实表外键连接属性通过外键索引在GPU中完成连接操作，并返回连接索引作为CPU中后续操作的中间数据。但是，当计算机系统中配置有多个GPU时，维表分布的全复制机制会产生较大的数据冗余代价，降低了有限的GPU缓存的利用率。另外，维表数据结构多样，谓词操作复杂度高，数据量小，更加适合于CPU处理。On the other hand, He Bingsheng et al. proposed the Co-Processing computing model in the paper "Relational query coprocessing on graphics processors" (published in "ACM Transactions on Database Systems" Volume 34, No. 4, December 2009), which distributed the query task load between GPUworker and CPUworker. , evaluate how relational operators are allocated on two processor nodes through the cost model of CPU and GPU. The paper proposes that simple calculations such as selection operations are suitable for CPU processing, and complex operations such as connection operations are suitable for GPU processing. GPUs are not suitable for branch prediction operations, so typical pipeline processing techniques in databases are difficult to implement on GPUs. The connection operation will generate a large amount of materialized data. In the OLAP (online analytical processing) application of the data warehouse, the star connection will generate complex multi-table connection operations. The parallel connection operation on the general-purpose GPU needs to be performed between each connection operator. For data partitioning operations, the cost of data preprocessing is relatively high. In the paper "Accelerating foreign-key joins using asymmetric memory channels" (published in Proceedings of International Conference on Very Large Data Bases 2011 (VLDB), 585-597), H.Pirk et al. regard GPU and CPU as a distributed database. In view of the high internal data access performance of GPU nodes, but The performance of the data channel between the GPU and the CPU is low. It is proposed to store the smaller dimension tables in the data warehouse in the general-purpose GPU. The foreign key connection attribute of the fact table completes the connection operation in the GPU through the foreign key index and returns the connection index. As intermediate data for subsequent operations in the CPU. However, when multiple GPUs are configured in the computer system, the full copy mechanism of the dimension table distribution will generate a large data redundancy cost and reduce the utilization rate of the limited GPU cache. In addition, dimension tables have diverse data structures, high predicate operation complexity, and small data volumes, which are more suitable for CPU processing.

但就发明人所知，目前还没有利用GPU缓存作为索引存储引擎来加速内存大数据OLAP查询处理性能方面的研究。But as far as the inventor knows, there is no research on using GPU cache as an index storage engine to accelerate the performance of OLAP query processing of memory big data.

发明内容Contents of the invention

本发明所要解决的技术问题在于提供一种GPU和CPU混合架构下的OLAP星型连接查询优化方法。该方法将位图连接索引和星型位图过滤技术应用于GPU和CPU混合架构中，有效优化复杂多维查询处理。The technical problem to be solved by the present invention is to provide an OLAP star join query optimization method under the mixed architecture of GPU and CPU. This method applies bitmap join index and star bitmap filtering technology to GPU and CPU hybrid architecture, effectively optimizing complex multi-dimensional query processing.

为实现上述的发明目的，本发明采用下述的技术方案：For realizing above-mentioned purpose of the invention, the present invention adopts following technical scheme:

一种GPU和CPU混合架构下的OLAP星型连接查询优化方法，包括如下步骤：A kind of OLAP star connection query optimization method under the mixed architecture of GPU and CPU, comprises the following steps:

首先通过位图连接索引过滤优化OLAP星型连接操作，在GPU缓存中缓存高频访问的连接位图；其次，将满足连接位图过滤条件的事实表外键属性组加载到GPU缓存中进行星型连接过滤；最后，通过GPU所生成的过滤位图将内存大事实表的全表扫描转换为按位置随机访问，从而提高OLAP星型连接的查询处理性能。Firstly, the OLAP star join operation is optimized by bitmap join index filtering, and the frequently accessed join bitmap is cached in the GPU cache; secondly, the foreign key attribute group of the fact table that satisfies the join bitmap filter condition is loaded into the GPU cache for star join operation. Finally, through the filter bitmap generated by the GPU, the full table scan of the large memory fact table is converted into random access by location, thereby improving the query processing performance of the OLAP star connection.

其中较优地，将位图连接索引中高频访问的关键字对应的位图作为GPU的位图索引成员，将其成员位图存储于GPU缓存中。Preferably, the bitmap corresponding to the frequently accessed keyword in the bitmap connection index is used as a bitmap index member of the GPU, and its member bitmap is stored in the GPU cache.

其中较优地，将事实表外键属性存储于内存中，通过GPU位图索引生成的过滤位图抽取满足索引条件的事实表外键属性到GPU缓存中。Preferably, the foreign key attributes of the fact table are stored in the memory, and the foreign key attributes of the fact table satisfying the index conditions are extracted through the filter bitmap generated by the GPU bitmap index into the GPU cache.

其中较优地，在执行查询处理时，首先根据查询中的谓词查找GPU中是否存在匹配的连接位图，如果存在则在GPU中执行相应的位图操作，将维表谓词位图加载到GPU缓存中。Wherein preferably, when performing query processing, first look up whether there is a matching connection bitmap in the GPU according to the predicate in the query, if there is, perform the corresponding bitmap operation in the GPU, and load the dimension table predicate bitmap to the GPU in cache.

其中较优地，事实表外键在GPU中通过外键映射实现星型连接位图过滤，并更新过滤位图，确定最终满足星型连接的事实表记录位置；CPU通过GPU所生成的过滤位图对事实表进行过滤，将大事实表的全表扫描转换为按位置随机访问。Wherein preferably, the foreign key of the fact table realizes star connection bitmap filtering through foreign key mapping in the GPU, and updates the filtering bitmap to determine the record position of the fact table that finally satisfies the star connection; the CPU passes through the filtering bit generated by the GPU The graph filters the fact table, converting a full table scan of a large fact table into random access by position.

其中较优地，所述过滤位图是根据查询关键字对应的GPU缓存连接位图上的位操作而生成的位图，用于过滤需要从内存传输到GPU缓存的事实表外键组。Wherein preferably, the filter bitmap is a bitmap generated according to bit operations on the GPU cache connection bitmap corresponding to the query key, and is used to filter the fact table foreign key groups that need to be transferred from the memory to the GPU cache.

其中较优地，在所述OLAP星型连接操作中，通过事实表外键与维表记录位置之间的映射关系将事实表外键映射到对应的维表谓词位图对应的位置上，从而将星型连接转换为事实表外键属性依次在对应的维表谓词位图上的过滤操作。Preferably, in the OLAP star connection operation, the fact table foreign key is mapped to the corresponding position of the corresponding dimension table predicate bitmap through the mapping relationship between the fact table foreign key and the dimension table record position, thereby Transform the star connection into the filtering operation of the foreign key attribute of the fact table on the corresponding dimension table predicate bitmap.

其中较优地，事实表外键属性组在进行星型连接过滤时，根据过滤结果更新过滤位图，将星型连接过滤的位操作结果为“0”的事实表记录在对应的过滤位图位置的“1”置为“0”。Preferably, when the fact table foreign key attribute group performs star connection filtering, the filtering bitmap is updated according to the filtering result, and the fact table whose bit operation result of the star connection filtering is "0" is recorded in the corresponding filtering bitmap A "1" in the position is set to a "0".

其中较优地，所述过滤位图指示事实表中对于当前查询满足谓词操作的记录的相对位置，对应的事实表记录与维表进行连接操作以完成后续的分组聚集操作。Wherein preferably, the filter bitmap indicates the relative position of records in the fact table that satisfy the predicate operation for the current query, and the corresponding fact table records are connected with the dimension table to complete the subsequent grouping and aggregation operation.

其中较优地，将GPU缓存中a％的存储空间作为位图连接索引的存储空间配额，用于缓存所述连接位图；1－a％的存储空间作为OLAP星型连接操作中的事实表外键属性缓存和维表谓词位图；其中，a的取值范围为20～60。Preferably, a% of the storage space in the GPU cache is used as the storage space quota of the bitmap connection index for caching the connection bitmap; 1-a% of the storage space is used as the fact table in the OLAP star connection operation Foreign key attribute cache and dimension table predicate bitmap; wherein, the value range of a is 20-60.

与现有技术相比较，本发明对GPU和CPU混合架构下的OLAP星型连接查询技术进行了优化。在通用GPU缓存和内存之间进行存储和查询处理优化，将通用GPU作为连接位图存储和处理引擎，通过位图索引提高事实表外键属性与通用GPU缓存之间的数据传输效率。利用维表谓词过滤位图上的事实表外键星型连接过滤，更新过滤位图，并通过过滤位图在内存实现按位置随机访问，显著提高了大事实表的访问效率和查询处理性能。Compared with the prior art, the present invention optimizes the OLAP star connection query technology under the mixed architecture of GPU and CPU. Optimize storage and query processing between the general-purpose GPU cache and memory, use the general-purpose GPU as a connection bitmap storage and processing engine, and improve the data transmission efficiency between the foreign key attribute of the fact table and the general-purpose GPU cache through the bitmap index. Use dimension table predicates to filter the fact table foreign key star connection on the bitmap, update the filter bitmap, and realize random access by location in the memory through the filter bitmap, which significantly improves the access efficiency and query processing performance of large fact tables.

附图说明Description of drawings

图1为GPU和CPU混合架构下的数据仓库存储模型示意图；Figure 1 is a schematic diagram of the data warehouse storage model under the hybrid architecture of GPU and CPU;

图2为在通用GPU中连接位图上的OLAP查询关键字位图操作示意图；Fig. 2 is the OLAP query keyword bitmap operation synoptic diagram connecting on the bitmap in general-purpose GPU;

图3为基于过滤位图的事实表外键属性记录内存访问处理示意图；Fig. 3 is a schematic diagram of the memory access processing of the fact table foreign key attribute record based on the filtering bitmap;

图4为在通用GPU中基于过滤位图的事实表外键星型维表位图过滤操作示意图；Fig. 4 is a schematic diagram of the filtering operation of the fact table foreign key star dimension table bitmap based on the filtering bitmap in a general-purpose GPU;

图5为CPU根据通用GPU过滤位图，抽取事实表记录完成OLAP查询处理示意图。FIG. 5 is a schematic diagram of CPU filtering bitmaps according to general-purpose GPUs, extracting fact table records to complete OLAP query processing.

具体实施方式detailed description

下面结合附图和具体实施方式对本发明做进一步的详细说明。The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

根据相关研究，GPU与CPU相当于通过PCI-E总线相连的一个分布式系统，其中GPU具有有限的高速缓存，适合于简单数据类型上的高速并行处理，但需要进一步降低CPU与GPU之间的数据传输代价。GPU缓存容量较小，但数据访问性能很高，适合于存储能够加速多表连接操作的索引。前已述及，众核协处理器已经成为高性能计算的基础平台。目前，众核协处理器主要以NVIDIA公司的通用GPU和Intel公司的至强融核^TM协处理器为代表，其中拥有2688个处理核心，250GB/秒的显存带宽和6GB显存；至强融核^TM协处理器5110P拥有60个内核（1.053GHz），240个物理线程，8GB内存和320GB/秒带宽。在下文中，以通用GPU为例进行说明，但本发明的技术内容同样适用于至强融核^TM协处理器。According to relevant research, GPU and CPU are equivalent to a distributed system connected by PCI-E bus, in which GPU has a limited high-speed cache, which is suitable for high-speed parallel processing on simple data types, but it is necessary to further reduce the communication between CPU and GPU. Data transfer cost. The GPU cache capacity is small, but the data access performance is high, and it is suitable for storing indexes that can accelerate multi-table join operations. As mentioned earlier, many-core coprocessors have become the basic platform for high-performance computing. At present, many-core coprocessors are mainly represented by NVIDIA's general-purpose GPU and Intel's Xeon ^PhiTM coprocessor, among which With 2688 processing cores, 250GB/s memory bandwidth and 6GB memory; The Xeon ^PhiTM coprocessor 5110P has 60 cores (1.053GHz), 240 physical threads, 8GB memory and 320GB/sec bandwidth. In the following, a general-purpose GPU is taken as an example, but the technical content of the present invention is also applicable to Xeon ^PhiTM coprocessor.

鉴于GPU和CPU的技术特点，本发明提出了一种在GPU和CPU混合架构下的OLAP星型连接查询优化方法。该方法将通用GPU用作OLAP星型连接（star-join）加速器，采用位图连接索引技术加速星型连接操作，利用通用GPU高性能缓存的存储访问性能和通用GPU的强大并行处理性能来提高OLAP星型连接的查询处理性能。In view of the technical characteristics of GPU and CPU, the present invention proposes an OLAP star connection query optimization method under the mixed architecture of GPU and CPU. This method uses the general-purpose GPU as an OLAP star-join (star-join) accelerator, uses the bitmap join index technology to accelerate the star-join operation, and utilizes the storage access performance of the high-performance cache of the general-purpose GPU and the powerful parallel processing performance of the general-purpose GPU to improve Query Processing Performance of OLAP Star Joins.

本发明根据通用GPU缓存较小但带宽性能高的特点，首先缓存高频访问的连接位图，并利用通用GPU的强大并行处理能力提高位图操作性能，通过通用GPU的位图索引加速提高索引性能。在通用GPU位图索引的支持下，通用GPU被用作OLAP星型连接加速器，仅将满足连接位图过滤条件的大事实表外键属性组加载到通用GPU缓存进行并行的星型连接过滤处理，利用通用GPU的强大并行处理能力加速星型连接处理性能；最后，将通用GPU生成的较小的事实表过滤位图传递给内存数据库，实现对大事实表的过滤操作，提高OLAP的表扫描效率，提高OLAP星型连接的查询处理性能。According to the characteristics of the general-purpose GPU's small cache but high bandwidth performance, the present invention first caches the connection bitmaps with high frequency access, and utilizes the powerful parallel processing capability of the general-purpose GPU to improve the operation performance of the bitmap, and accelerates the index through the bitmap index of the general-purpose GPU performance. With the support of the general-purpose GPU bitmap index, the general-purpose GPU is used as an OLAP star connection accelerator, and only the large fact table foreign key attribute groups that meet the connection bitmap filter conditions are loaded into the general-purpose GPU cache for parallel star connection filtering processing , using the powerful parallel processing capability of general-purpose GPU to accelerate star connection processing performance; finally, pass the smaller fact table filter bitmap generated by general-purpose GPU to the memory database to realize the filtering operation of large fact table and improve the table scan of OLAP Efficiency, improve the query processing performance of OLAP star join.

其中，通用GPU在本发明中的主要角色是连接索引存储与处理，事实表外键星型连接过滤。通过生成的过滤位图指示事实表中OLAP查询中所需要最终进行连接的记录，消除不必要事实表记录访问时的带宽消耗和消除不必要的连接操作代价，提高内存访问的效率。Among them, the main role of the general-purpose GPU in the present invention is to store and process the connection index, and filter the star connection of the foreign key of the fact table. The generated filter bitmap indicates the records that need to be finally connected in the OLAP query in the fact table, eliminates bandwidth consumption and unnecessary connection operation costs when accessing unnecessary fact table records, and improves the efficiency of memory access.

另外，为了实施本发明，内存数据库引擎需要支持基于位图的位置访问，即在列存储内存数据库中能够通过标准的位图访问接口实现按位置随机扫描，以便提高表扫描的效率。In addition, in order to implement the present invention, the in-memory database engine needs to support bitmap-based location access, that is, in the column-store in-memory database, random scanning by location can be realized through a standard bitmap access interface, so as to improve the efficiency of table scanning.

在本发明中，将OLAP星型连接查询处理分为三个执行阶段：位图连接索引过滤、事实表外键星型维表位图过滤和基于事实表过滤位图的查询处理。通过这三个执行阶段提高索引访问性能，降低通用GPU对于内存的数据访问量，将内存大量记录数据扫描操作转换为高效的按位置随机访问，从而减少查询处理中的大事实表扫描代价，消除这些记录的连接操作代价，优化复杂多维查询处理的整体性能。下面对此展开详细具体的说明。In the present invention, the OLAP star join query processing is divided into three execution stages: bitmap join index filtering, fact table foreign key star dimension table bitmap filtering and query processing based on fact table filtering bitmap. Improve index access performance through these three execution stages, reduce the amount of data access to memory by general-purpose GPUs, and convert a large number of record data scanning operations in memory into efficient random access by location, thereby reducing the cost of large fact table scanning in query processing and eliminating The join operation cost of these records optimizes the overall performance of complex multidimensional query processing. The following is a detailed description of this.

首先，本发明提供了一种GPU和CPU混合架构下的数据仓库存储模型。在现有技术中，并没有将数据仓库的索引、事实表和维表存储、OLAP星型连接操作等存储和处理技术在统一的GPU和CPU混合架构下进行系统的优化。而且，传统连接技术在通用GPU上需要复杂HASH表等数据结构，增加了通用GPU处理的复杂性。现有技术在涉及星型连接OLAP查询时，基于分区技术的星型连接需要物化大量的中间连接结果，降低GPU缓存的使用效率，降低了通用GPU的并行处理效率。同时，缺乏索引的支持也使事实表在通用GPU连接加速时，需要在内存和通用GPU缓存之间移动大量数据，增加了通用GPU的数据预处理代价。针对上述的技术问题，本发明所提供的数据仓库存储模型在GPU和CPU之间一方面对位图索引机制进行了优化，另一方面对事实表和维表的存储和查询处理机制进行了优化。First, the present invention provides a data warehouse storage model under the mixed architecture of GPU and CPU. In the prior art, the storage and processing technologies such as data warehouse index, fact table and dimension table storage, and OLAP star connection operation are not systematically optimized under the unified GPU and CPU mixed architecture. Moreover, traditional connection technologies require data structures such as complex HASH tables on general-purpose GPUs, which increases the complexity of general-purpose GPU processing. In the prior art, when star join OLAP query is involved, the star join based on partitioning technology needs to materialize a large number of intermediate join results, which reduces the efficiency of GPU cache usage and the parallel processing efficiency of general-purpose GPUs. At the same time, the lack of index support also makes the fact table need to move a large amount of data between the memory and the general GPU cache when the general GPU connection is accelerated, which increases the data preprocessing cost of the general GPU. Aiming at the above-mentioned technical problems, the data warehouse storage model provided by the present invention optimizes the bitmap index mechanism between the GPU and the CPU on the one hand, and optimizes the storage and query processing mechanism of the fact table and the dimension table on the other hand. .

在图1所示的数据仓库存储模型中，数据仓库中的数据对象主要包括事实表、维表和位图连接索引。其中维表采用内存存储，事实表为内存存储或磁盘存储。内存包括DRAM和flashmemory。它通过PCI-E通道访问通用GPU的缓存（即图1中的显存），同时通过内存通道供CPU内核访问。In the data warehouse storage model shown in Figure 1, the data objects in the data warehouse mainly include fact tables, dimension tables and bitmap join indexes. The dimension table is stored in memory, and the fact table is stored in memory or disk. Memory includes DRAM and flashmemory. It accesses the cache of the general-purpose GPU (that is, the video memory in Figure 1) through the PCI-E channel, and at the same time provides access to the CPU core through the memory channel.

事实表可以采用内存存储或磁盘存储。当采用磁盘存储时，事实表可以存储大量数据，但多核CPU或通用GPU的高性能计算能力被磁盘I/O的巨大延迟所抵消，使处理器处于闲置状态，浪费了强大的并行计算资源，因此在本发明中优选以大事实表内存存储为应用背景。事实表采用列存储，包括外键列和度量列。其中，事实表外键属性根据通用GPU连接位图所生成的过滤位图被按指定位置抽取到通用GPU缓存中进行星型位图过滤操作，因此事实表外键属性可以使用独立的列存储，也可以将全部外键属性以列组（columngroup）形式作为超列（supercolumn）存储。相应的过滤位图是根据查询关键字对应的通用GPU缓存连接位图上的位操作而生成的位图，用于过滤需要从内存传输到通用GPU缓存的事实表外键组，可以提高通用GPU缓存的存储效率和数据传输效率。度量属性在内存中采用列存储，根据模式特点和所使用的存储引擎决定其采用列存储或混合存储模型。Fact tables can be stored in memory or on disk. When using disk storage, the fact table can store a large amount of data, but the high-performance computing capability of multi-core CPU or general-purpose GPU is offset by the huge delay of disk I/O, which makes the processor idle and wastes powerful parallel computing resources. Therefore, in the present invention, the memory storage of large fact tables is preferably used as the application background. Fact tables are stored in columns, including foreign key columns and measure columns. Among them, the filter bitmap generated by the fact table foreign key attribute according to the general GPU connection bitmap is extracted to the general GPU cache for star-shaped bitmap filtering operation, so the fact table foreign key attribute can be stored in an independent column, It is also possible to store all foreign key attributes as supercolumns in the form of columngroups. The corresponding filtering bitmap is a bitmap generated according to bit operations on the general GPU cache connection bitmap corresponding to the query keyword, and is used to filter the fact table foreign key groups that need to be transferred from the memory to the general GPU cache, which can improve general GPU Cache storage efficiency and data transfer efficiency. The metric attributes are stored in columns in memory, and the column storage or hybrid storage model is determined according to the characteristics of the schema and the storage engine used.

维表驻留于内存中，用于用户查询中的维表谓词处理，将生成的维表谓词过滤位图传输到通用GPU的缓存中作为事实表过滤位图。该过程，维表存储由内存数据库提供存储和复杂谓词处理功能。当用户发出查询请求时，首先通过查询改写将查询中各维表上的谓词操作转换为对应维表上的谓词处理，并生成相应的维表谓词过滤位图，如将查询The dimension table resides in memory and is used for dimension table predicate processing in user queries, and the generated dimension table predicate filter bitmap is transferred to the general-purpose GPU cache as a fact table filter bitmap. In this process, the memory database provides storage and complex predicate processing functions for dimension table storage. When the user issues a query request, firstly, the predicate operation on each dimension table in the query is converted into the predicate processing on the corresponding dimension table through query rewriting, and the corresponding dimension table predicate filter bitmap is generated. For example, the query

SELECTc_nation，sum（lo_revenue）asprofitSELECT c_nation, sum(lo_revenue) asprofit

FROMcustomer，supplier，part，lineorderFROMcustomer,supplier,part,lineorder

WHERElo_custkey=c_custkeyWHERElo_custkey=c_custkey

ANDlo_suppkey=s_suppkeyANDlo_suppkey=s_suppkey

ANDlo_partkey=p_partkeyANDlo_partkey=p_partkey

ANDc_region='AMERICA'ANDc_region='AMERICA'

ANDs_region='AMERICA'ANDs_region='AMERICA'

AND（p_category='MFGR#41'ORp_category='MFGR#42'）AND(p_category='MFGR#41' OR p_category='MFGR#42')

GROUPBYc_nationORDERBYc_nation;GROUP BYc_nation ORDER BYc_nation;

改写为如下三个查询，每个查询将对应维表上的谓词表达式集中处理，按照谓词表达式的结果生成一个位图向量，标识每个维表记录对查询中相关谓词的满足情况：It is rewritten as the following three queries. Each query will centrally process the predicate expression on the corresponding dimension table, and generate a bitmap vector according to the result of the predicate expression to identify the satisfaction of each dimension table record for the relevant predicate in the query:

SELECTCASEWHENS_region='AMERICA'1ELSE0SELECT CASE WHENS_region='AMERICA'1ELSE0

FROMSupplier；FROM Supplier;

SELECTCASEWHEN（p_category='MFGR#41'ORp_category='MFGR#42'）1ELSE0FROMPart；SELECTCASEWHEN(p_category='MFGR#41' OR p_category='MFGR#42') 1ELSE0FROMPart;

SELECTCASEWHENC_region='AMERICA'1ELSE0SELECT CASE WHENC_region='AMERICA'1ELSE0

FROMcustomer；FROMcustomer;

位图连接索引采用内存存储或磁盘存储，用于通过预连接操作为维表中指定属性列中的成员创建位图，标识该成员在与其连接的事实表中的连接位置。在本发明中，位图连接索引可以采用标准数据库中的位图连接索引技术为指定维表的低势集属性列（或者另行指定单个列或多个列）创建，也可以采用自定义方式为指定维属性中的部分高频访问成员创建位图，将不同维表、不同属性列成员的位图作为统一的位图连接索引对象，统一位图存储和访问接口，实现根据关键字的连接位图访问。默认地，列中每个成员都创建一个连接位图指示特定列成员在事实表上的连接位置。在本发明的一个实施例中，将位图连接索引中访问频率高的部分连接位图存储在通用GPU的缓存中，作为高速连接位图存储和位图索引处理引擎，提供基于查询关键字在通用GPU中查找相关位图并进行位图运算的功能。根据查询中的关键字在通用GPU的缓存中定位相应的连接位图并进行位运算，生成过滤位图，从内存的事实表中抽取相应的外键属性组到通用GPU的缓存中，进行星型连接过滤操作。The bitmap join index adopts memory storage or disk storage, and is used to create a bitmap for the member in the specified attribute column in the dimension table through pre-join operation, and identify the join position of the member in the fact table to which it is joined. In the present invention, the bitmap join index can be created for the low-potential set attribute column of the specified dimension table (or specify a single column or multiple columns) by using the bitmap join index technology in the standard database, or it can be created for Create bitmaps for some frequently accessed members in specified dimension attributes, use bitmaps of different dimension tables and different attribute column members as unified bitmap connection index objects, unify bitmap storage and access interfaces, and realize connection bits based on keywords Figure access. By default, each member of a column creates a join bitmap indicating where the particular column member is joined on the fact table. In one embodiment of the present invention, the part of the link bitmap with high access frequency in the bitmap link index is stored in the cache of the general-purpose GPU, as a high-speed link bitmap storage and bitmap index processing engine, providing a query keyword based on the The function of finding related bitmaps and performing bitmap operations in general-purpose GPUs. According to the keyword in the query, locate the corresponding connection bitmap in the cache of the general-purpose GPU and perform bit operations to generate a filter bitmap, extract the corresponding foreign key attribute group from the fact table in the memory to the cache of the general-purpose GPU, and perform staring. Type connection filtering operation.

位图连接索引中的位图（即图1中的连接位图）与事实表等长，其存储空间可以由位图数量×事实表记录数量/8（byte）来计算。在本发明中，考虑到通用GPU的缓存容量较小，因此不缓存全部的位图连接索引，而是将高频访问的谓词关键字对应的连接位图缓存于通用GPU缓存中，以便提高通用GPU缓存的存储效率。具体地说，将通用GPU的缓存作为高频访问连接位图存储引擎，将通用GPU的缓存中a％的存储空间作为位图连接索引的存储空间配额，用于存储连接位图；1－a％的存储空间作为OLAP星型连接操作中的事实表外键属性缓存。其中，a％的大小由用户指定，用于权衡索引访问性能和星型连接处理性能。在本发明的一个实施例中，a的取值范围为20～60，优选值为40。The bitmap in the bitmap connection index (that is, the connection bitmap in Figure 1) is as long as the fact table, and its storage space can be calculated by the number of bitmaps × the number of records in the fact table/8 (byte). In the present invention, considering that the cache capacity of the general-purpose GPU is relatively small, all bitmap connection indexes are not cached, but the connection bitmaps corresponding to the predicate keywords of high-frequency access are cached in the general-purpose GPU cache, so as to improve general-purpose Storage efficiency of GPU cache. Specifically, the general-purpose GPU cache is used as a high-frequency access connection bitmap storage engine, and a% of the storage space in the general-purpose GPU cache is used as the storage space quota for the bitmap connection index to store the connection bitmap; 1-a % of storage space as fact table foreign key attribute cache in OLAP star join operations. Among them, the size of a% is specified by the user, and is used to weigh the index access performance and the star join processing performance. In an embodiment of the present invention, a ranges from 20 to 60, preferably 40.

接下来，介绍本发明中基于GPU和CPU混合架构的OLAP星型连接操作技术。Next, the OLAP star connection operation technology based on GPU and CPU hybrid architecture in the present invention is introduced.

在本发明中，将一个典型的OLAP星型连接查询任务分解为三个执行阶段：（1）根据查询中的谓词关键字查找是否在通用GPU的缓存中存储有对应的连接位图，如果存在若干匹配的连接位图则根据查询谓词表达式完成相应的位图计算，通过通用GPU的并行处理，生成过滤位图；（2）在过滤位图中，按位图取值为1的位置抽取对应的事实表外键列组子集，并将维表谓词过滤位图加载到通用GPU缓存，将事实表外键值映射为维表谓词过滤位图对应的偏移位置，根据维表谓词过滤位图对应位置是否为1进行外键连接过滤，实现事实表记录在多个维表上的星型连接过滤。将过滤位图中位图值为1而对应的事实表外键星型连接过滤结果为0的位图位置置为0，更新过滤位图；（3）将通用GPU生成的最终过滤位图传回CPU，并通过过滤位图对事实表进行过滤后，完成OLAP查询处理。In the present invention, a typical OLAP star join query task is decomposed into three execution phases: (1) according to the predicate keyword in the query, whether the corresponding link bitmap is stored in the cache of the general-purpose GPU, if there is Several matching connection bitmaps complete the corresponding bitmap calculation according to the query predicate expression, and generate a filter bitmap through the parallel processing of the general-purpose GPU; (2) in the filter bitmap, extract according to the position where the value of the bitmap is 1 The corresponding fact table foreign key column group subset, and the dimension table predicate filter bitmap is loaded to the general GPU cache, and the fact table foreign key value is mapped to the offset position corresponding to the dimension table predicate filter bitmap, and filtered according to the dimension table predicate Whether the corresponding position of the bitmap is 1 is used for foreign key connection filtering, and the star connection filtering of fact table records on multiple dimension tables is realized. Set the bitmap position in the filter bitmap with a bitmap value of 1 and the corresponding fact table foreign key star connection filter result to 0, and update the filter bitmap; (3) Pass the final filter bitmap generated by the general-purpose GPU to After returning to the CPU and filtering the fact table through the filtering bitmap, the OLAP query processing is completed.

在上述OLAP星型连接查询处理过程中，首先通过位图连接索引优化OLAP星型连接操作，将传统位图连接索引中高频访问的关键字对应的位图作为通用GPU位图索引成员，将其成员位图存储于通用GPU缓存中，保证较小的通用GPU缓存能够存储最有利于OLAP星型连接优化的索引数据。由于事实表外键属性存储于内存中，我们首先通过通用GPU位图索引生成的过滤位图抽取满足索引条件的事实表外键属性到通用GPU缓存中，以便提高内存与通用GPU之间的数据传输效率和通用GPU的缓存存储效率。另外，事实表上的全表扫描是OLAP星型连接操作的性能瓶颈因素，本发明通过两级过滤消减OLAP查询中事实表扫描数据量，通过通用GPU所生成的过滤位图将内存大事实表的全表扫描转换为按位置随机访问的部分扫描操作，提高事实表的存储访问效率。In the above-mentioned OLAP star join query processing process, the OLAP star join operation is firstly optimized through the bitmap join index, and the bitmap corresponding to the frequently accessed keywords in the traditional bitmap join index is used as a general GPU bitmap index member, and its The member bitmap is stored in the general-purpose GPU cache to ensure that the smaller general-purpose GPU cache can store index data that is most conducive to OLAP star connection optimization. Since the foreign key attributes of the fact table are stored in the memory, we first extract the foreign key attributes of the fact table that meet the index conditions into the general GPU cache through the filtered bitmap generated by the general GPU bitmap index, so as to improve the data transfer between the memory and the general GPU. Transfer efficiency and cache storage efficiency for general-purpose GPUs. In addition, the full table scan on the fact table is the performance bottleneck factor of the OLAP star connection operation. The present invention reduces the data volume of the fact table scan in the OLAP query through two-stage filtering, and converts the large fact table into memory through the filter bitmap generated by the general-purpose GPU. The full table scan is converted into a partial scan operation based on random access by location, which improves the storage access efficiency of the fact table.

在执行OLAP星型连接的查询处理时，由通用GPU存储高频访问的连接位图。查询处理执行时，首先根据查询中的谓词查找通用GPU中是否存在匹配的连接位图。如果存在，首先在通用GPU中执行相应的位图操作，通过通用GPU的强大并行处理能力提高位图操作性能，如果不存在则放弃该操作；其次，只将维表谓词过滤位图加载到通用GPU缓存中，能够提高通用GPU缓存的利用率，减少维表上复杂谓词的处理复杂度，简化数据库系统的设计；再次，事实表外键在通用GPU中通过外键映射实现星型连接位图过滤，并更新过滤位图，确定最终满足星型连接的事实表记录位置。CPU通过通用GPU生成的过滤位图对事实表进行过滤，从而将大事实表的全表扫描转换为高效率的按位置随机访问，提高内存带宽性能，优化OLAP星型连接的查询处理性能。When executing the query processing of OLAP star connection, the high-frequency access connection bitmap is stored by the general-purpose GPU. When query processing is executed, firstly, according to the predicate in the query, it is checked whether there is a matching connection bitmap in the general-purpose GPU. If it exists, first perform the corresponding bitmap operation in the general-purpose GPU, and improve the performance of the bitmap operation through the powerful parallel processing capability of the general-purpose GPU; In the GPU cache, it can improve the utilization rate of the general GPU cache, reduce the processing complexity of complex predicates on the dimension table, and simplify the design of the database system; thirdly, the foreign key of the fact table realizes the star connection bitmap through foreign key mapping in the general GPU Filter and update the filter bitmap to determine the position of the fact table record that finally meets the star connection. The CPU filters the fact table through the filter bitmap generated by the general-purpose GPU, thereby converting the full table scan of a large fact table into high-efficiency random access by location, improving memory bandwidth performance, and optimizing the query processing performance of OLAP star joins.

图2为通用GPU中连接位图上的OLAP查询关键字位图操作示意图。传统的位图连接索引是在指定维表列或多个列上为每一个列成员创建标识事实表连接关系的位图，因此位图连接索引通常选择低势集的列以减少位图索引的存储开销。在OLAP星型连接查询的第一个阶段中，位图连接索引可以采取常规方法，但位图连接索引存储在大容量磁盘中，我们将查询中高频访问的谓词关键字对应的位图通过内存加载到通用GPU缓存中，以提高较小容量通用GPU缓存的空间利用率，并利用通用GPU的高并行处理能力处理大事实表连接位图上的位操作，提高位图索引处理性能。例如在图2中，查询中的谓词c_region='AMERICA'、c_region='AMERICA'和p_category='MFGR#41'对应的连接位图存储在通用GPU缓存中，但p_category='MFGR#42'对应的连接位图并未被缓存，因此OR对应的谓词关键字连接位图不完全，不参与连接位图计算，通用GPU执行对c_region='AMERICA'和c_region='AMERICA'对应关键字连接位图的并行AND运算，并生成连接过滤位图。Fig. 2 is a schematic diagram of an OLAP query keyword bitmap operation on a connection bitmap in a general-purpose GPU. The traditional bitmap join index is to create a bitmap identifying the join relationship of the fact table for each column member on the specified dimension table column or columns, so the bitmap join index usually selects columns with low potential sets to reduce the cost of the bitmap index storage overhead. In the first stage of the OLAP star join query, the bitmap join index can adopt the conventional method, but the bitmap join index is stored in a large-capacity disk, and we pass the bitmap corresponding to the predicate keyword frequently accessed in the query through the memory Load it into the general-purpose GPU cache to improve the space utilization of the small-capacity general-purpose GPU cache, and use the high parallel processing capability of the general-purpose GPU to process the bit operations on the big fact table connection bitmap to improve the bitmap index processing performance. For example in Figure 2, the connection bitmap corresponding to the predicates c_region='AMERICA', c_region='AMERICA' and p_category='MFGR#41' in the query is stored in the general GPU cache, but p_category='MFGR#42' corresponds to The connection bitmap of is not cached, so the connection bitmap of the predicate keyword corresponding to OR is incomplete and does not participate in the calculation of the connection bitmap. The general-purpose GPU executes the connection bitmap corresponding to c_region='AMERICA' and c_region='AMERICA' Parallel AND operation, and generate a connection filter bitmap.

图3为基于过滤位图的事实表外键列的内存访问示意图。在OLAP星型连接查询的第二个阶段中，通用GPU中的连接位图操作产生了较低选择率的过滤位图，然后通用GPU根据过滤位图值为1的位置从事实表外键属性中抽取对应的外键属性组到通用GPU中，完成星型连接操作。本发明中的OLAP星型连接操作不是将传统的连接操作在通用GPU中并行化，而是通过事实表外键与维表记录位置之间的映射关系将事实表外键映射到对应的维表谓词过滤位图对应的位置上，星型连接转换为事实表外键属性依次在对应的维表谓词过滤位图上的过滤操作。该操作可以转换为数组上的按下标直接访问操作，适合于通用GPU的并行处理，并能够有较好的并行处理性能。Fig. 3 is a schematic diagram of the memory access of the foreign key column of the fact table based on the filtered bitmap. In the second stage of the OLAP star join query, the join bitmap operation in the general-purpose GPU produces a filter bitmap with a lower selection rate, and then the general-purpose GPU selects the foreign key attribute of the fact table according to the position where the filter bitmap value is 1 Extract the corresponding foreign key attribute group to the general-purpose GPU to complete the star connection operation. The OLAP star connection operation in the present invention does not parallelize the traditional connection operation in the general-purpose GPU, but maps the fact table foreign key to the corresponding dimension table through the mapping relationship between the fact table foreign key and the dimension table record position At the position corresponding to the predicate filter bitmap, the star connection is converted into a filter operation on the corresponding dimension table predicate filter bitmap for the foreign key attribute of the fact table. This operation can be converted into a direct access operation by subscripting on the array, which is suitable for parallel processing of general-purpose GPUs and can have better parallel processing performance.

通过过滤位图的事实表外键属性抽取操作，大大降低了事实表外键属性在内存和通用GPU缓存之间的数据传输量，提高了带宽效率和通用GPU并行处理的效率。在根据过滤位图抽取事实表外键列组时，可以将事实表外键属性组中过滤位图值为1的位置逻辑聚合为数据块，并为每个事实表外键列组附加位图中对应位图值为1的位置，标识该事实表外键列组所对应的位图位置。抽取的事实表外键列组聚合成的数据块被传输到通用GPU缓存中，准备进行维表谓词过滤位图的过滤处理。By filtering the bitmap foreign key attribute extraction operation of the fact table, the amount of data transfer between the memory and the general GPU cache of the foreign key attribute of the fact table is greatly reduced, and the bandwidth efficiency and the parallel processing efficiency of the general GPU are improved. When extracting the fact table foreign key column group according to the filter bitmap, you can logically aggregate the position where the filter bitmap value is 1 in the fact table foreign key attribute group into a data block, and attach a bitmap to each fact table foreign key column group The position corresponding to the bitmap value of 1 in , identifies the bitmap position corresponding to the foreign key column group of the fact table. The extracted data blocks aggregated by the foreign key column groups of the fact table are transferred to the general-purpose GPU cache, and are prepared for the filtering process of the dimension table predicate filtering bitmap.

图4为在通用GPU中基于过滤位图的事实表外键星型维表位图过滤操作。根据数据仓库中多维存储模型的概念，维表记录可以被映射为一个有序序列，如1，2，3，…。维表谓词过滤位图是一个与维表等长的位图，分别记录了查询在当前维表中谓词表达式是否满足的状态，连接过滤位图从事实表中抽取满足位图索引运算结果的事实表外键组到通用GPU缓存，并通过事实表外键依次映射到维表谓词过滤位图的相应位置，完成星型连接过滤。事实表外键属性组在进行星型连接过滤时，根据过滤结果更新过滤位图，将星型连接过滤的位操作结果为“0”的事实表记录在对应的过滤位图位置的“1”置为“0”。如果事实表外键属性不满足星型位图过滤，如图4中第2条记录（5，1，4，2），则将位图中对应位置“4”上的“1”置为“0”，更新过滤位图。Figure 4 shows the bitmap filtering operation of the fact table foreign key star dimension table based on the filtering bitmap in a general-purpose GPU. According to the concept of multi-dimensional storage model in data warehouse, dimension table records can be mapped into an ordered sequence, such as 1, 2, 3, .... Dimension table predicate filter bitmap is a bitmap with the same length as the dimension table, which respectively records the status of whether the query predicate expression in the current dimension table is satisfied, and the connection filter bitmap extracts from the fact table that satisfy the result of the bitmap index operation. The fact table foreign key is grouped into the general GPU cache, and the fact table foreign key is sequentially mapped to the corresponding position of the dimension table predicate filtering bitmap to complete the star connection filtering. When the fact table foreign key attribute group performs star connection filtering, the filter bitmap is updated according to the filtering result, and the fact table whose bit operation result of star connection filtering is "0" is recorded in "1" in the corresponding filter bitmap position Set to "0". If the foreign key attribute of the fact table does not satisfy the star bitmap filter, as shown in the second record (5, 1, 4, 2) in Figure 4, set the "1" at the corresponding position "4" in the bitmap to " 0", update the filter bitmap.

在OLAP星型连接查询的第三个阶段中，CPU根据通用GPU过滤位图抽取事实表的记录，完成OLAP查询处理。如图5所示，通用GPU完成了位图连接索引上的位图运算，生成过滤位图，然后根据过滤位图抽取事实表的外键属性组到通用GPU的缓存，并通过维表谓词过滤位图完成星型连接过滤操作，更新过滤位图，确定事实表中满足星型连接过滤条件的记录位置。在经过连接位图过滤和事实表星型过滤二级过滤操作后，过滤位图指示事实表中对于当前查询满足谓词操作的记录的相对位置，对应的事实表记录需要与维表进行实质的连接操作以完成后续的分组聚集操作。在连接操作中，根据连接位图的访问情况可以对部分连接的维表进行剪枝，以便缩减查询树中的连接节点。最后，通用GPU中生成的过滤位置传输回内存，作为内存数据库中事实表的附加过滤条件，将对大事实表的全表扫描转换为按位图位置的随机访问，从而提高事实表的访问效率及OLAP查询处理的效率。In the third stage of the OLAP star join query, the CPU extracts the records of the fact table according to the general-purpose GPU filter bitmap, and completes the OLAP query processing. As shown in Figure 5, the general-purpose GPU completes the bitmap operation on the bitmap connection index, generates a filter bitmap, and then extracts the foreign key attribute group of the fact table to the cache of the general-purpose GPU according to the filter bitmap, and filters through the dimension table predicate The bitmap completes the star connection filtering operation, updates the filtering bitmap, and determines the record position in the fact table that satisfies the star connection filtering condition. After the join bitmap filter and the fact table star filter secondary filter operation, the filter bitmap indicates the relative position of the record in the fact table that satisfies the predicate operation for the current query, and the corresponding fact table record needs to be substantially connected with the dimension table operation to complete subsequent grouping aggregation operations. In the join operation, the partly joined dimension tables can be pruned according to the access condition of the join bitmap, so as to reduce the join nodes in the query tree. Finally, the filter position generated in the general-purpose GPU is transferred back to the memory as an additional filter condition for the fact table in the memory database, converting the full table scan of the large fact table into random access by bitmap position, thereby improving the access efficiency of the fact table And the efficiency of OLAP query processing.

通用GPU生成的过滤位图需要从通用GPU缓存传输到内存完成其后的OLAP处理。当过滤位图非常稀疏时（0的位置足够多时），可以采用位图压缩存储方式以减少位图传输的数据量。在本发明的一个实施例中，采用连续的m位数值存储位图值为“1”的偏移位置的方法来压缩位图空间，如位图“01000010”存储为（2，7）。m的位数由事实表记录长度N决定，即m是大于lnN的最小基本数值类型，如lnN小于16时，可以采用shortint（16位）型数值存储位图中“1”位置的偏移值。采用位图压缩技术的阈值为位图过滤选择率η＜1/m，当满足压缩条件时，过滤位图被存储为ηN个连续的m位数值序列。The filtered bitmap generated by the general-purpose GPU needs to be transferred from the general-purpose GPU cache to the memory to complete the subsequent OLAP processing. When the filtering bitmap is very sparse (the position of 0 is enough), the bitmap compression storage method can be used to reduce the amount of data transmitted by the bitmap. In one embodiment of the present invention, the bitmap space is compressed by using continuous m-digit values to store the offset position of the bitmap value "1", for example, the bitmap "01000010" is stored as (2, 7). The number of digits of m is determined by the record length N of the fact table, that is, m is the smallest basic numerical type greater than lnN, such as when lnN is less than 16, a shortint (16-bit) type value can be used to store the offset value of the "1" position in the bitmap . The threshold for bitmap compression technology is bitmap filtering selectivity η<1/m, when the compression condition is met, the filtered bitmap is stored as ηN consecutive m-digit value sequences.

在本发明的一个实施例中，索引生成的过滤位图用于在事实表上进行连接记录筛选，并且根据索引位图将用户原始输入的查询Q优化为查询Q’。在查询Q’中，如果谓词关键字存在连接位图，并且对应维表属性没有出现在分组属性中，如SQL示例中的s_region=’AMERICAN’AND（p_mfgr=’MFGR#1’ORp_mfgr=’MFGR#2’）谓词关键字存在连接位图，则索引所生成的过滤位图隐含了事实表与维表supplier和part的连接关系，查询Q’中可以将lineorder与supplier表和part表的连接操作进行剪枝，在缩减事实表扫描代价的基础上将表连接的数量减少到两个。优化后的查询Q’如下所示：In one embodiment of the present invention, the filter bitmap generated by the index is used to filter the connection records on the fact table, and the query Q originally input by the user is optimized to query Q' according to the index bitmap. In the query Q', if the predicate keyword exists in the connection bitmap, and the corresponding dimension table attribute does not appear in the grouping attribute, such as s_region='AMERICAN'AND in the SQL example (p_mfgr='MFGR#1'ORp_mfgr='MFGR #2') If the predicate keyword has a connection bitmap, the filter bitmap generated by the index implies the connection relationship between the fact table and the dimension table supplier and part. In the query Q', the connection between lineorder, supplier table and part table can be connected The operation is pruned, and the number of table connections is reduced to two on the basis of reducing the cost of scanning the fact table. The optimized query Q' is as follows:

原始查询Q：Original query Q:

SELECTd_year，c_nation，SUM（lo_revenue-lo_supplycost）ASprofitSELECT d_year, c_nation, SUM(lo_revenue-lo_supplycost) ASprofit

FROMdate，customer，supplier，part，lineorderFROM date, customer, supplier, part, line order

WHERElo_custkey=c_custkeyWHERElo_custkey=c_custkey

ANDlo_suppkey=s_suppkeyANDlo_suppkey=s_suppkey

ANDlo_partkey=p_partkeyANDlo_partkey=p_partkey

ANDlo_orderdate=d_datekeyANDlo_orderdate=d_datekey

ANDc_region=’AMERICAN’ANDc_region='AMERICAN'

ANDs_region=’AMERICAN’ANDs_region='AMERICAN'

AND（p_mfgr=’MFGR#1’ORp_mfgr=’MFGR#2’）AND(p_mfgr='MFGR#1' ORp_mfgr='MFGR#2')

GROUPBYd_year，c_nationGROUP BY d_year, c_nation

ORDERBYd_year，c_nationORDER BY d_year, c_nation

优化后的查询Q’：Optimized query Q':

FROMdate，customer，lineorderFROM date, customer, line order

WHERElo_custkey=c_custkeyWHERElo_custkey=c_custkey

ANDlo_orderdate=d_datekeyANDlo_orderdate=d_datekey

ANDc_region=’AMERICAN’ANDc_region='AMERICAN'

GROUPBYd_year，c_nationGROUP BY d_year, c_nation

ORDERBYd_year，c_nationORDER BY d_year, c_nation

为了验证本发明所提供的OLAP星型连接查询优化方法的实际效果，发明人使用一台普通台式机作为实验平台，配置为IntelCorei3-2350MCPU2.30GHz，8GB内存，64位Windows7操作系统，配置一块GeForce610M显卡，CUDA计算能力2.1，1GB显存，GPUClock主频1.48Ghz，48个CUDAcores，每block1024线程，GPU显存与内存之间的数据带宽约为2GB/秒。In order to verify the actual effect of the OLAP star connection query optimization method provided by the present invention, the inventor uses a common desktop as an experimental platform, configured as IntelCorei3-2350MCPU2.30GHz, 8GB memory, 64-bit Windows7 operating system, and configures a GeForce610M Graphics card, CUDA computing capability 2.1, 1GB video memory, GPUClock main frequency 1.48Ghz, 48 CUDAcores, 1024 threads per block, the data bandwidth between GPU video memory and memory is about 2GB/s.

发明人模拟大数据内存存储场景，维表和事实表都存储在内存中，GPU显存只存储选择出的TOPK关键字位图连接索引，利用GPU强大的并行处理能力加速索引位图计算，并只将索引过滤后较少的事实表外键加载到GPU进行星型连接处理，将并行写冲突较高的分组聚集计算转移到CPU处理，发挥不同处理器的优势。The inventor simulated the big data memory storage scene, the dimension table and the fact table are stored in the memory, the GPU memory only stores the selected TOPK keyword bitmap connection index, and uses the powerful parallel processing capability of the GPU to accelerate the calculation of the index bitmap, and only Load the fewer foreign keys of the fact table after index filtering to the GPU for star connection processing, transfer the group aggregation calculation with high parallel write conflicts to the CPU processing, and take advantage of different processors.

在具体测试中，发明人选择SSB（StarSchemaBenchmark）作为测试标准，数据集大小为4GB（SF＝4，24000000行记录），将查询中使用的谓词关键字作为系统索引关键字，在预处理阶段建立连接位图并存储于GPU显存，将GPU用作独立的GPU索引处理引擎。发明人选择最具有代表性的Q4查询组作为测试查询，Q4查询为事实表与四个维表连接，包含数量较多的谓词表达式，能够更好地体现关键字位图连接索引所产生的多个位图之间位操作代价的优化。In the specific test, the inventor chooses SSB (StarSchemaBenchmark) as the test standard, the data set size is 4GB (SF=4, 24,000,000 rows), and the predicate keyword used in the query is used as the system index keyword, which is established in the preprocessing stage Link the bitmap and store it in the GPU memory, and use the GPU as an independent GPU index processing engine. The inventor chooses the most representative Q4 query group as the test query. The Q4 query is the connection between the fact table and the four dimension tables, and contains a large number of predicate expressions, which can better reflect the results generated by the keyword bitmap join index. Optimization of bit operation cost between multiple bitmaps.

在CPU/GPU混合平台上，OLAP星型连接查询处理分为以下几个过程：On the CPU/GPU hybrid platform, OLAP star join query processing is divided into the following processes:

◆索引创建BitFilter◆Index creation BitFilter

◆BitFilter传输：GPU→CPU◆BitFilter transmission: GPU→CPU

◆事实表外键组传输：CPU→GPU◆Fact table foreign key group transfer: CPU→GPU

◆星型连接并生成分组向量◆Star connection and generate grouping vector

◆分组向量传输：GPU→CPU◆Packet vector transmission: GPU→CPU

◆事实表度量属性聚集运算◆Fact table metric attribute aggregate operation

其中，CPU平台的处理过程包括：索引创建BitFilter、星型连接并生成分组向量和事实表度量属性聚集运算三个过程；CPU/GPU混合Co-OLAP则包括全部处理过程。在实验中，已通过GPU索引机制和星型连接优化最小化查询中CPU与GPU之间的数据传输量。Among them, the processing process of the CPU platform includes three processes: index creation BitFilter, star connection and generation of grouping vectors, and fact table measurement attribute aggregation operation; CPU/GPU hybrid Co-OLAP includes all processing processes. In experiments, the amount of data transfer between CPU and GPU in queries has been minimized through GPU indexing mechanism and star join optimization.

表1查询执行代价分析Table 1 Query execution cost analysis

表1显示了查询执行的代价分析结果。从表1所统计的CPU查询处理过程和CPU/GPU协同处理过程可以看出，CPU处理中代价较大的位图运算和星型连接运算在GPU中获得极高的性能，在几十微秒时间内就能完成多核CPU上万微秒完成的位图计算。即使处理大数据超长位图也能获得极高的处理性能。而星型连接操作由于进行了面向GPU处理特点的优化，把复杂的连接操作简化为数组上的按位直接访问操作，适合GPU的并行处理模式，因此多表星型连接也产生了极大的性能收益。Table 1 shows the cost analysis results of query execution. From the statistics of CPU query processing and CPU/GPU cooperative processing in Table 1, it can be seen that bitmap operations and star connection operations, which are expensive in CPU processing, achieve extremely high performance in GPU. The bitmap calculation completed by the multi-core CPU in tens of thousands of microseconds can be completed within a short time. Even processing large data and ultra-long bitmaps can achieve extremely high processing performance. The star connection operation is optimized for GPU processing characteristics, and the complex connection operation is simplified to a bit-by-bit direct access operation on the array, which is suitable for the parallel processing mode of the GPU. Therefore, the multi-table star connection also has a huge impact. performance gain.

与现有技术相比较，本发明具有如下的技术特点：Compared with the prior art, the present invention has the following technical characteristics:

1.将通用GPU缓存作为高频访问连接位图的存储引擎，使用较小的GPU缓存存储较小的连接位图，当查询关键字在连接位图中命中时，通用GPU能够提供高性能的并行位图访问和多个位图之间的位运算处理性能，提高连接位图的运算性能（在CPU中，大位图之间的位操作计算代价相对较大）；1. Use the general-purpose GPU cache as a storage engine for high-frequency access to the connection bitmap, and use a smaller GPU cache to store a smaller connection bitmap. When the query keyword hits the connection bitmap, the general-purpose GPU can provide high-performance Parallel bitmap access and bit operation processing performance between multiple bitmaps, improving the operation performance of connected bitmaps (in the CPU, the calculation cost of bit operations between large bitmaps is relatively high);

2.通用GPU缓存中存储的是高频且低选择率的位图。当查询中命中多个位图索引时，生成的过滤位图选择率较低，能够过滤掉事实表中的大部分记录，只需要按过滤位图取值为1的位置将事实表相关外键列数据抽取到通用GPU缓存中进行星型连接处理，能够大大减少内存与通用GPU之间的数据传输量，提高数据访问性能；2. High-frequency and low-selection bitmaps are stored in the general-purpose GPU cache. When multiple bitmap indexes are hit in the query, the filter bitmap generated has a low selection rate and can filter out most of the records in the fact table. You only need to set the related foreign key of the fact table according to the position where the filter bitmap takes a value of 1 The column data is extracted to the general-purpose GPU cache for star connection processing, which can greatly reduce the amount of data transmission between the memory and the general-purpose GPU, and improve data access performance;

3.通用GPU中事实表外键记录通过维表谓词过滤位图进行二次过滤，通过将事实表外键值映射到由CPU生成的维表谓词过滤位图相应位置的方法，将事实表外键与维表的连接操作转换为事实表外键在维表谓词过滤位图上的过滤操作，星型连接转换为事实表外键属性在多个维表位图上的过滤操作，在通用GPU中可以通过数组结构的事实表外键与维表谓词过滤位图之间的按顺序的位图过滤完成星型连接操作，不需要物化每一个事实表外键连接的结果；3. The foreign key records of the fact table in the general-purpose GPU are filtered twice through the dimension table predicate filtering bitmap. By mapping the fact table foreign key value to the corresponding position of the dimension table predicate filtering bitmap generated by the CPU, the foreign The join operation of the key and the dimension table is converted into the filtering operation of the foreign key of the fact table on the dimension table predicate filter bitmap, and the star connection is converted into the filtering operation of the foreign key attribute of the fact table on multiple dimension table bitmaps, on a general-purpose GPU The star join operation can be completed through sequential bitmap filtering between the fact table foreign key of the array structure and the dimension table predicate filtering bitmap, without materializing the result of each fact table foreign key connection;

4.位图索引为第一级过滤，事实表外键星型连接过滤为第二级过滤，两级过滤操作共享一个过滤位图。第二级过滤在第一级过滤的基础上更新位图值，通用GPU完成位图索引和事实表外键过滤后将生成的过滤位图传递给内存，由CPU根据位图从事实表中抽取对应的记录进行OLAP查询处理，减少大事实表的扫描代价，有效提高大数据时OLAP的查询处理性能。4. The bitmap index is the first level of filtering, the fact table foreign key star connection filtering is the second level of filtering, and the two levels of filtering operations share a filtering bitmap. The second-level filtering updates the bitmap value based on the first-level filtering. After the general-purpose GPU completes bitmap index and fact table foreign key filtering, the generated filtered bitmap is passed to the memory, and the CPU extracts it from the fact table according to the bitmap. The corresponding records are processed by OLAP query, which reduces the scanning cost of large fact tables and effectively improves the query processing performance of OLAP when dealing with large data.

综上所述，本发明将位图连接索引和星型位图过滤技术应用于GPU和CPU混合架构中，提高通用GPU缓存的存储效率和通用GPU的并行处理效率，从整体上提高混合处理器平台的OLAP查询处理性能。本发明不仅适合于采用GPU和CPU混合架构的内存数据库应用，同样也适用于通用数据库中的分析处理应用。In summary, the present invention applies bitmap connection index and star bitmap filtering technology to GPU and CPU hybrid architecture, improves the storage efficiency of general-purpose GPU cache and the parallel processing efficiency of general-purpose GPU, and improves the hybrid processor as a whole. OLAP query processing performance of the platform. The present invention is not only suitable for the application of the internal memory database adopting the mixed architecture of GPU and CPU, but also suitable for the analysis and processing application in the general database.

以上对本发明所提供的GPU和CPU混合架构下OLAP星型连接查询优化方法进行了详细的说明。对本领域的技术人员而言，在不背离本发明实质精神的前提下对它所做的任何显而易见的改动，都将构成对本发明专利权的侵犯，将承担相应的法律责任。The above is a detailed description of the OLAP star join query optimization method provided by the present invention under the mixed architecture of GPU and CPU. For those skilled in the art, any obvious changes made to it without departing from the essence and spirit of the present invention will constitute an infringement of the patent right of the present invention and will bear corresponding legal responsibilities.

Claims

1. an OLAP star join query optimization method under a GPU and CPU mixed architecture, is characterized in that comprising the steps:

Optimize OLAP star connection operations through bitmap connection index filtering, and cache frequently accessed connection bitmaps in the GPU cache;

Load the foreign key attribute group of the fact table that satisfies the filter condition of the connection bitmap into the GPU cache for star connection filtering;

Through the filter bitmap generated by the GPU, the full table scan of the large memory fact table is converted into random access by location, thereby improving the query processing performance of the OLAP star connection;

When performing query processing, firstly, according to the predicate in the query, it is checked whether there is a matching connection bitmap in the GPU, and if it exists, the corresponding bitmap operation is performed in the GPU, and the dimension table predicate bitmap is loaded into the GPU cache.

2. OLAP star connection query optimization method as claimed in claim 1, is characterized in that:

The bitmaps corresponding to the frequently accessed keywords in the bitmap connection index are used as the bitmap index members of the GPU, and the member bitmaps are stored in the GPU cache.

3. OLAP star connection query optimization method as claimed in claim 1, is characterized in that:

Store the foreign key attributes of the fact table in memory, and extract the foreign key attributes of the fact table that meet the index conditions into the GPU cache through the filter bitmap generated by the GPU bitmap index.

4. OLAP star connection query optimization method as claimed in claim 1, is characterized in that:

The foreign key of the fact table implements star connection bitmap filtering through foreign key mapping in the GPU, and updates the filtering bitmap to determine the record position of the fact table that finally meets the star connection; the CPU performs filtering on the fact table through the filtering bitmap generated by the GPU. Filtering, which converts full table scans of large fact tables into random access by position.

5. the OLAP star join query optimization method as claimed in claim 1, 3 or 4, is characterized in that:

The filter bitmap is a bitmap generated according to bit operations on the GPU cache connection bitmap corresponding to the query keyword, and is used to filter the foreign key group of the fact table that needs to be transferred from the memory to the GPU cache.

6. OLAP star connection query optimization method as claimed in claim 5, is characterized in that:

The filtering bitmap indicates the relative position of the records satisfying the predicate operation for the current query in the fact table, and the corresponding fact table records are connected with the dimension table to complete subsequent grouping and aggregation operations.

7. OLAP star connection query optimization method as claimed in claim 1, is characterized in that:

In the OLAP star join operation, the foreign key of the fact table is mapped to the position corresponding to the predicate bitmap of the dimension table through the mapping relationship between the foreign key of the fact table and the record position of the dimension table, and the star join is converted into The filtering operation of the foreign key attributes of the fact table on the corresponding dimension table predicate bitmap in turn.

8. OLAP star connection query optimization method as claimed in claim 7, is characterized in that:

When the fact table foreign key attribute group performs star connection filtering, the filter bitmap is updated according to the filtering result, and the fact table whose bit operation result of star connection filtering is "0" is recorded in "1" in the corresponding filter bitmap position Set to "0".

9. The OLAP star join query optimization method as described in any one of claims 1 to 3, characterized in that:

A% of the storage space in the GPU cache is used as the storage space quota of the bitmap connection index for caching the connection bitmap; 1-a% of the storage space is used as the foreign key attribute cache of the fact table in the OLAP star connection operation; Wherein, the value range of a is 20-60.