CN114637775A

CN114637775A - Query optimization system, method and equipment based on Monte Carlo tree search and reinforcement learning

Info

Publication number: CN114637775A
Application number: CN202210319758.6A
Authority: CN
Inventors: 王宏志; 张恺欣; 崔双双; 丁小欧
Original assignee: Harbin Institute of Technology Shenzhen
Current assignee: Harbin Institute of Technology Shenzhen
Priority date: 2022-03-29
Filing date: 2022-03-29
Publication date: 2022-06-17

Abstract

Query optimization system, method and equipment based on Monte Carlo tree search and reinforcement learning, and belongs to the technical field of computers. In order to solve the problems of weak compatibility and poor stability of the conventional NEO query optimization method, the system of the invention adopts the same framework as an NEO query optimization model, wherein a value model unit: predicting the cost of the query plan by utilizing the characteristics corresponding to the query plan based on the value model; the value model is a neural network model; the input of the value model is a vector tree used for representing a query plan needing to estimate the overhead, the topological structure of the vector tree is a binary tree structure, and the node codes are sequentially spliced according to the sequence traversal order of the tree; the node characteristics of the nodes consist of the codes of the node information; the query plan searching unit adopts a Monte Carlo tree searching method to search a query plan according to the prediction of query plan- > time cost, and generates an execution plan from a search space. The method is mainly used for query optimization in the computer.

Description

Query optimization system, method and device based on Monte Carlo tree search and reinforcement learning

技术领域technical field

本发明涉及一种查询优化系统及方法，属于计算机技术领域。The invention relates to a query optimization system and method, and belongs to the technical field of computers.

背景技术Background technique

目前，查询优化算法分为两大类，即传统查询优化算法和基于AI的查询优化算法。前者通常使用基数/代价估计技术从若干候选计划中筛选查询计划，或使用预定义的规则对初始查询计划进行优化。此类技术目前应用最广，但较为落后。后者则使用AI技术进行基数/代价估计，或直接使用机器学习技术生成优化的查询计划。此类方法中，最为先进且最有代表性的为NEO和BAO技术。At present, query optimization algorithms are divided into two categories, namely traditional query optimization algorithms and AI-based query optimization algorithms. The former usually use cardinality/cost estimation techniques to filter query plans from several candidate plans, or use predefined rules to optimize the initial query plan. This type of technology is currently the most widely used, but it is relatively backward. The latter uses AI techniques for cardinality/cost estimation, or directly uses machine learning techniques to generate optimized query plans. Among such methods, the most advanced and representative ones are NEO and BAO technologies.

NEO技术是一个端到端的基于学习的查询优化方法，可以优化关系数据库中的连接顺序、索引选择和物理算子选择。它将查询计划生成问题建模为一个马尔可夫模型，使用强化学习算法训练了一个基于树卷积神经网络的深度学习模型，并使用该模型的输出，逐步生成与输入的SQL查询语言等价的查询计划树。该马尔可夫模型的输入(即强化学习中的“状态”)分为两部分：查询级别编码和计划级别编码，编码方案如图1和图2所示。可以参见论文《Neo:A Learned Query Optimizer》，或者相关对论文的说明，例如https://blog.csdn.net/cuirise/article/details/111466631、https://zhuanlan.zhihu.com/p/80233443等。NEO technology is an end-to-end learning-based query optimization method that optimizes join order, index selection, and physical operator selection in relational databases. It models the query plan generation problem as a Markov model, uses a reinforcement learning algorithm to train a deep learning model based on a tree convolutional neural network, and uses the output of the model to gradually generate the equivalent of the input SQL query language query plan tree. The input to this Markov model (i.e., the "state" in reinforcement learning) is divided into two parts: query-level encoding and plan-level encoding. The encoding scheme is shown in Figures 1 and 2. You can refer to the paper "Neo: A Learned Query Optimizer", or related descriptions of the paper, such as https://blog.csdn.net/cuirise/article/details/111466631, https://zhuanlan.zhihu.com/p/ 80233443 etc.

查询级别编码包含了数据库中各个表的连接操作的关系(红色向量)，以及连接当中使用到的谓词(蓝色向量)。而计划级别编码包含了整个查询的结构(即查询计划树的结构)，以及每个算子的访问方式信息(索引或直接访问表)。The query-level encoding contains the relationship of the join operation (red vector) of each table in the database, and the predicates used in the join (blue vector). The plan level code contains the structure of the entire query (ie, the structure of the query plan tree), and the access mode information (index or direct access table) of each operator.

NEO通过树卷积神经网，使用上述两个编码作为输入，预测查询计划的“价值”。价值模型(价值模型)利用经验池中存储的历史查询计划和其对应的开销进行训练，损失函数使用了MSE，标签为经验池中同一查询的不同查询计划中，对应的最小的开销。NEO uses the above two codes as input through a tree convolutional neural network to predict the "value" of a query plan. The value model (value model) uses the historical query plans stored in the experience pool and their corresponding costs for training. The loss function uses MSE, and the label is the minimum cost of different query plans of the same query in the experience pool.

如图3所示，NEO的主要流程如下：As shown in Figure 3, the main flow of NEO is as follows:

1、使用一个“弱”的优化器(比如PostgreSQL的优化器)生成一个初始执行计划，称为bootstrap。1. Use a "weak" optimizer (such as PostgreSQL's optimizer) to generate an initial execution plan, called bootstrap.

2、经验(Experience)表示一个(执行计划->时间开销)的集合，这是价值模型的训练样本。2. Experience represents a set of (execution plan -> time cost), which is the training sample of the value model.

3、特征化(Featurizer)抽取两种类型的特征：查询特征、执行计划特征。3. Featurizer extracts two types of features: query features and execution plan features.

4、用抽取到的特征训练价值模型，建立查询计划->时间开销的预测模型。4. Use the extracted features to train the value model, and establish a query plan -> time cost prediction model.

5、在这个模型上做查询计划搜索，这里用的是简单的贪心搜索，选出一个执行计划。5. Do a query plan search on this model. Here, a simple greedy search is used to select an execution plan.

其中查询计划搜索部分，维护了一个小顶堆，堆内元素为子查询计划，优先级为价值模型，每次取堆顶的开销最小的子计划(初始情况下只有查询树的根节点)，扩展它的所有合法子计划(扩展查询计划树的子结点)，预测各个扩展出的子计划能生成的完整查询计划的最小开销(这里使用的是查询执行时间)，选出开销最小的计划插入小顶堆。反复执行这个过程，直到子计划扩展完毕得到完整的查询计划。The search part of the query plan maintains a small top heap, the elements in the heap are sub-query plans, and the priority is the value model. Expand all its legal sub-plans (expand the child nodes of the query plan tree), predict the minimum cost of the complete query plan that can be generated by each expanded sub-plan (the query execution time is used here), and select the plan with the smallest cost Insert a small top heap. This process is repeated until the subplan is expanded to obtain a complete query plan.

因此其存在以下问题：So it has the following problems:

1.此技术只能支持一部分有限的数据库内关系操作，具体来讲，仅支持选择、投影、等值连接和聚集查询。1. This technology can only support a limited number of in-database relational operations, specifically, only selection, projection, equi-join and aggregation queries.

2.性能不稳定，在部分查询负载上，经过该方案优化后的查询计划执行时间较长，存在长尾分布问题。2. The performance is unstable. On some query loads, the query plan optimized by this solution takes a long time to execute, and there is a long-tail distribution problem.

发明内容SUMMARY OF THE INVENTION

本发明是为了解决现有的NEO查询优化方法存在兼容性弱和稳定性差的问题。The invention aims to solve the problems of weak compatibility and poor stability in the existing NEO query optimization method.

基于蒙特卡洛树搜索和强化学习的查询优化系统，所述系统采用与NEO查询优化模型相同的框架，包括价值模型单元和查询计划搜索单元；A query optimization system based on Monte Carlo tree search and reinforcement learning, which adopts the same framework as the NEO query optimization model, including a value model unit and a query plan search unit;

价值模型单元：基于价值模型利用查询计划对应的特征预测查询计划的开销；Value model unit: Based on the value model, the cost of the query plan is predicted by using the characteristics corresponding to the query plan;

所述的价值模型为神经网络模型；价值模型的输入为一棵向量树，用于表示需要估计开销的查询计划，向量树的拓扑结构为二叉树结构，各节点编码按照树的层序遍历顺序依次拼接；节点的节点特征由节点信息的编码组成；所述的节点信息包括节点类型、操作涉及到的表、操作涉及到的索引类型、操作涉及到的谓词、节点的基数估计结果；The value model is a neural network model; the input of the value model is a vector tree, which is used to represent the query plan that needs to estimate the cost, the topology structure of the vector tree is a binary tree structure, and the coding of each node is in the order of the tree traversal order. Splicing; the node characteristics of the node are composed of the encoding of the node information; the node information includes the node type, the table involved in the operation, the index type involved in the operation, the predicate involved in the operation, and the cardinality estimation result of the node;

查询计划搜索单元：采用蒙特卡洛树搜索方法，根据查询计划->时间开销的预测做查询计划搜索，从搜索空间中生成一个执行计划。Query plan search unit: Using Monte Carlo tree search method, query plan search is performed according to the prediction of query plan -> time cost, and an execution plan is generated from the search space.

进一步地，所述节点信息的编码如下：Further, the encoding of the node information is as follows:

节点类型为one-hot编码；操作涉及到的表为one-hot编码；操作涉及到的索引类型为one-hot编码；操作涉及到的谓词为one-hot编码；节点的基数估计结果为整数值。The node type is one-hot encoding; the table involved in the operation is one-hot encoding; the index type involved in the operation is one-hot encoding; the predicate involved in the operation is one-hot encoding; the cardinality estimation result of the node is an integer value .

进一步地，作为价值模型的神经网络模型采用多层堆叠树卷积层的架构。Further, the neural network model as the value model adopts the architecture of multi-layer stacked tree convolutional layers.

进一步地，所述查询计划搜索单元采用蒙特卡洛树搜索的方法搜索时，将生成更优查询计划的过程建模为马尔可夫过程，使用蒙特卡洛树搜索扩展子查询计划，采用语义一致的子查询计划对应的向量树编码作为蒙特卡洛树搜索中的树节点，每次搜索时使用价值模型评估子查询计划的开销，进而实现蒙特卡洛树搜索。Further, when the query plan search unit uses the method of Monte Carlo tree search to search, the process of generating a better query plan is modeled as a Markov process, the Monte Carlo tree search is used to expand the sub-query plan, and the semantic consistency is adopted. The vector tree code corresponding to the sub-query plan of is used as the tree node in the Monte Carlo tree search, and the cost model of the sub-query plan is used to evaluate the cost of the sub-query plan in each search, and then the Monte Carlo tree search is realized.

进一步地，查询计划搜索单元采用蒙特卡洛树搜索的方法搜索的过程包括以下步骤：Further, the search process of the query plan search unit using the method of Monte Carlo tree search includes the following steps:

(1)选择：从根节点R开始，使用UCT算法递归选择最优的子节点直到达到叶子节点L；(1) Selection: Starting from the root node R, use the UCT algorithm to recursively select the optimal child node until the leaf node L is reached;

(2)扩展：如果L不是一个终止节点，那么就创建一个或者更多的、与查询的语义不矛盾的子节点，选择其中一个子节点C；(2) Extension: if L is not a termination node, then create one or more sub-nodes that do not contradict the semantics of the query, and select one of the sub-nodes C;

(3)模拟：从C开始，随机选择一个不矛盾的子节点直到完整地表达了当前的查询计划；(3) Simulation: starting from C, randomly select a non-contradictory child node until the current query plan is completely expressed;

(4)反向传播：用价值模型预测模拟的完整查询计划的开销，并从这个叶子节点开始反向更新它的所有父亲节点的模拟次数和奖励reward；(4) Backpropagation: Use the value model to predict the cost of the simulated complete query plan, and reversely update the simulation times and rewards of all its parent nodes from this leaf node;

(5)循环执行步骤(1)-(4)，不断扩展子查询计划直到生成出一个完整查询计划。(5) Steps (1)-(4) are executed cyclically, and the sub-query plan is continuously expanded until a complete query plan is generated.

基于蒙特卡洛树搜索和强化学习的查询优化方法，包括以下步骤：A query optimization method based on Monte Carlo tree search and reinforcement learning, including the following steps:

将价值模型作为查询计划->时间开销的预测模型；在查询计划->时间开销的预测模型上做查询计划搜索；Use the value model as the prediction model of query plan -> time cost; do query plan search on the prediction model of query plan -> time cost;

所述的查询计划搜索采用蒙特卡洛树搜索方法进行搜索，根据查询计划->时间开销的预测做查询计划搜索，从搜索空间中生成一个执行计划。The query plan search uses the Monte Carlo tree search method to search, and performs query plan search according to the query plan->time cost prediction, and generates an execution plan from the search space.

进一步地，所述的价值模型的训练过程包括以下步骤：Further, the training process of the described value model includes the following steps:

S1、将查询对应的执行计划和时间开销作为经验，构建经验集合；将经验结合中的每一条经验对应的执行计划和时间开销作为价值模型的一个训练样本；S1. Use the execution plan and time cost corresponding to the query as experience, and construct an experience set; use the execution plan and time cost corresponding to each experience in the experience combination as a training sample of the value model;

S2、对训练样本进行特征化，即抽取特征，所述特征包括查询特征、执行计划特征；S2, characterize the training samples, that is, extract features, and the features include query features and execution plan features;

S3、利用抽取到的特征训练价值模型，训练好的价值模型即建立的查询计划->时间开销的预测模型。S3. Use the extracted features to train the value model, and the trained value model is the established query plan -> the prediction model of time cost.

基于蒙特卡洛树搜索和强化学习的查询优化设备，所述设备为一种存储介质，所述存储介质中存储有至少一条指令，所述至少一条指令由处理器加载并执行以实现所述的基于蒙特卡洛树搜索和强化学习的查询优化方法。A query optimization device based on Monte Carlo tree search and reinforcement learning, the device is a storage medium, and the storage medium stores at least one instruction, and the at least one instruction is loaded and executed by a processor to realize the A query optimization method based on Monte Carlo tree search and reinforcement learning.

基于蒙特卡洛树搜索和强化学习的查询优化设备，所述设备包括处理器和存储器，所述存储器中存储有至少一条指令，所述至少一条指令由处理器加载并执行以实现所述的基于蒙特卡洛树搜索和强化学习的查询优化方法。A query optimization device based on Monte Carlo tree search and reinforcement learning, the device includes a processor and a memory, the memory stores at least one instruction, and the at least one instruction is loaded and executed by the processor to implement the Query Optimization Methods for Monte Carlo Tree Search and Reinforcement Learning.

有益效果：Beneficial effects:

本发明通过改进NEO中使用的Value Model(价值模型)，以及改进NEO中基于贪心的Plan Search方法，实现了对关系数据库内所有关系操作的兼容，且相比于NEO性能更高更稳定。By improving the Value Model (value model) used in NEO and improving the greedy-based Plan Search method in NEO, the present invention realizes the compatibility of all relational operations in the relational database, and has higher and more stable performance than NEO.

附图说明Description of drawings

图1为NEO查询级别编码示意图；Figure 1 is a schematic diagram of NEO query level coding;

图2为NEO计划级别编码示意图；Figure 2 is a schematic diagram of NEO plan level coding;

图3为NEO系统框架示意图；Figure 3 is a schematic diagram of the NEO system framework;

图4为本发明查询计划向量树编码示例；Fig. 4 is a query plan vector tree coding example of the present invention;

图5为本发明价值模型结构示例；Figure 5 is an example of the value model structure of the present invention;

图6为蒙特卡洛树搜索过程示意图。Figure 6 is a schematic diagram of the Monte Carlo tree search process.

具体实施方式Detailed ways

具体实施方式一：Specific implementation one:

本实施方式为一种基于蒙特卡洛树搜索和强化学习的查询优化方法，查询优化的过程是通过基于蒙特卡洛树搜索和强化学习的查询优化系统实现的。本发明与NEO同为基于代价和AI的查询优化方法，故本发明的系统(基于蒙特卡洛树搜索和强化学习的查询优化系统)框架也如图3所示，均是使用预训练的价值模型(即Value Model)预测查询计划的开销，并对查询计划的搜索加以指导。二者的不同之处在于价值模型的网络结构与查询计划搜索(Plan Search)方法；即：本发明的系统与NEO查询优化模型相同的框架，包括价值模型单元和查询计划搜索单元；This embodiment is a query optimization method based on Monte Carlo tree search and reinforcement learning, and the process of query optimization is realized by a query optimization system based on Monte Carlo tree search and reinforcement learning. The present invention and NEO are both cost-based and AI-based query optimization methods, so the framework of the system of the present invention (a query optimization system based on Monte Carlo tree search and reinforcement learning) is also shown in Figure 3, both of which use the value of pre-training The model (ie, the Value Model) predicts the cost of the query plan and guides the search for the query plan. The difference between the two lies in the network structure of the value model and the query plan search (Plan Search) method; that is: the system of the present invention has the same framework as the NEO query optimization model, including a value model unit and a query plan search unit;

查询计划搜索单元：采用蒙特卡洛树搜索的方法，根据查询计划->时间开销的预测做查询计划搜索，从搜索空间中生成一个执行计划。Query plan search unit: The method of Monte Carlo tree search is used to search the query plan according to the prediction of query plan -> time cost, and generate an execution plan from the search space.

本实施方式所述的一种基于蒙特卡洛树搜索和强化学习的查询优化方法，包括以下步骤：A query optimization method based on Monte Carlo tree search and reinforcement learning described in this embodiment includes the following steps:

3、特征化(Featurizer)抽取特征：查询特征、执行计划特征。3. Featurizer extraction features: query features, execution plan features.

4、使用本发明提出的神经网络作为价值模型，用抽取到的特征训练价值模型，建立查询计划->时间开销的预测模型。4. Use the neural network proposed by the present invention as a value model, train the value model with the extracted features, and establish a query plan->time cost prediction model.

5、在查询计划->时间开销的预测模型上做查询计划搜索，不同于NEO，本发明采用蒙特卡洛树搜索的方法，从搜索空间中生成一个执行计划。5. Query plan search is performed on the prediction model of query plan->time cost. Different from NEO, the present invention adopts the method of Monte Carlo tree search to generate an execution plan from the search space.

在价值模型方面，由于查询基数与查询开销通常成线性关系，准确的查询基数估计结果将有效提高查询开销预测精度，故本发明将基于数据分布的基数估计策略作为特征引入到了价值模型中；即在估计给定的查询计划的开销时，使用基数估计技术中最有效的Naru方案，对查询计划中各个节点的基数进行估计。估计结果被编码到向量树中作为价值模型的输入。In terms of the value model, since the query cardinality and the query cost are usually in a linear relationship, an accurate query cardinality estimation result will effectively improve the prediction accuracy of the query cost. Therefore, the present invention introduces the cardinality estimation strategy based on the data distribution into the value model as a feature; When estimating the cost of a given query plan, the cardinality of each node in the query plan is estimated using the most efficient Naru scheme among cardinality estimation techniques. The estimation results are encoded into a vector tree as input to the value model.

价值模型的输入为一棵向量树，用于表示需要估计开销的查询计划。如图4所示，价值模型的输入如下：The input to the value model is a vector tree representing the query plan that requires an estimated cost. As shown in Figure 4, the inputs to the value model are as follows:

拓扑结构：二叉树结构，各节点编码按照树的层序遍历顺序依次拼接；Topological structure: binary tree structure, each node code is spliced in sequence according to the traversal order of the tree;

节点特征：1的一维向量；所述节点特征由以下信息的编码组成；Node feature: a one-dimensional vector of 1; the node feature consists of the encoding of the following information;

节点类型的one-hot编码(支持所有物理关系操作，包含空节点)；One-hot encoding of node type (supports all physical relationship operations, including empty nodes);

操作涉及到的表：one-hot编码；Tables involved in the operation: one-hot encoding;

操作涉及到的索引类型：one-hot编码；The index type involved in the operation: one-hot encoding;

操作涉及到的谓词：one-hot编码；Predicates involved in operations: one-hot encoding;

节点的基数估计结果：整数值；The cardinality estimation result of the node: integer value;

通过此种编码，本发明将NEO中的查询级编码和计划级编码进行了统一联合的表示，更有利于机器学习模型提取特征。由于输入的改变，价值模型的网络架构也与NEO有所不同，采用了多层堆叠树卷积层的架构，并引入了残差连接和批归一化-全连接层(BN)用于稳定训练，如图5所示。Through this encoding, the present invention expresses the query-level encoding and the plan-level encoding in NEO in a unified and combined manner, which is more conducive to feature extraction by the machine learning model. Due to the change of input, the network architecture of the value model is also different from NEO, adopting the architecture of multi-layer stacked tree convolutional layers, and introducing residual connections and batch normalization-full connection layer (BN) for stabilization training, as shown in Figure 5.

最后，价值模型的预测目标也与NEO不同，NEO的预测目标为各个扩展出的子计划能生成的完整查询计划的最小开销，而本发明则只是单纯预测输入的查询计划的开销。Finally, the prediction target of the value model is also different from that of NEO. The prediction target of NEO is the minimum cost of the complete query plan that can be generated by each extended sub-plan, while the present invention simply predicts the cost of the input query plan.

在查询计划搜索(Plan search)方面，本发明将生成更优查询计划的过程建模为马尔可夫过程，使用蒙特卡洛树搜索(MCTS)扩展子查询计划，并采用上述价值模型作为蒙特卡洛树搜索的启发函数。注：在搜索过程中，使用传统数据库(例如PostgreSQL)的查询优化器确定每次的搜索空间，即本次搜索有哪些可扩展的子节点，可以使得最终生成的查询计划与初始查询计划语义一致。目前所有基于代价的现有查询优化器均支持此功能，不再赘述。本发明采用语义一致的子查询计划对应的向量树编码作为蒙特卡洛树搜索中的树节点，每次搜索时使用价值模型评估子查询计划的开销。In terms of query plan search (Plan search), the present invention models the process of generating a better query plan as a Markov process, uses Monte Carlo Tree Search (MCTS) to expand the sub-query plan, and adopts the above-mentioned value model as the Monte Carlo tree search (MCTS). Heuristic function for Los Angeles search. Note: During the search process, the query optimizer of a traditional database (such as PostgreSQL) is used to determine the search space each time, that is, which expandable child nodes there are in this search, so that the final generated query plan can be semantically consistent with the initial query plan . All existing cost-based query optimizers currently support this feature and will not be repeated here. The present invention adopts the vector tree code corresponding to the sub-query plan with consistent semantics as the tree node in the Monte Carlo tree search, and uses the value model to evaluate the cost of the sub-query plan in each search.

如图6所示，蒙特卡洛树搜索过程如下：As shown in Figure 6, the Monte Carlo tree search process is as follows:

(1)选择(Selection)：从根节点R开始，使用UCT算法递归选择最优的子节点直到达到叶子节点L。(1) Selection: Starting from the root node R, use the UCT algorithm to recursively select the optimal child node until the leaf node L is reached.

(2)扩展(Expansion)：如果L不是一个终止节点(也就是，不能完整表达整个查询计划)，那么就基于规则创建一个或者更多的、与查询的语义不矛盾的子节点，选择其中一个子节点C。(2) Expansion: If L is not a termination node (that is, it cannot fully express the entire query plan), then create one or more sub-nodes that do not contradict the semantics of the query based on the rules, and select one of them child node C.

(3)模拟(Simulation)：从C开始，使用与上述第(2)步中相同的规则，随机选择一个不矛盾的子节点直到完整地表达了当前的查询计划。(3) Simulation: Starting from C, using the same rules as in step (2) above, randomly select a non-contradictory child node until the current query plan is completely expressed.

(4)反向传播(Backpropagation)：用价值模型预测模拟的完整查询计划的开销，并从这个叶子节点开始反向更新它的所有父亲节点的模拟次数和奖励reward(即平均开销)。(4) Backpropagation: Use the value model to predict the cost of the simulated complete query plan, and reversely update the number of simulations and reward (ie average cost) of all its parent nodes from this leaf node.

(5)循环执行上述的(1)-(4)步，不断扩展子查询计划直到生成出一个完整查询计划。(5) Steps (1)-(4) above are executed in a loop, and the sub-query plan is continuously expanded until a complete query plan is generated.

具体实施方式二：Specific implementation two:

本实施方式为基于蒙特卡洛树搜索和强化学习的查询优化设备，所述设备为一种存储介质，所述存储介质中存储有至少一条指令，所述至少一条指令由处理器加载并执行以实现如权利要求6至8之一所述的基于蒙特卡洛树搜索和强化学习的查询优化方法。This embodiment is a query optimization device based on Monte Carlo tree search and reinforcement learning, the device is a storage medium, and at least one instruction is stored in the storage medium, and the at least one instruction is loaded and executed by a processor to Implement the query optimization method based on Monte Carlo tree search and reinforcement learning as claimed in one of claims 6 to 8.

本实施方式中的存储介质包括但不限于U盘、硬盘等。The storage medium in this embodiment includes, but is not limited to, a U disk, a hard disk, and the like.

具体实施方式三：Specific implementation three:

本实施方式为基于蒙特卡洛树搜索和强化学习的查询优化设备，所述设备包括处理器和存储器，所述存储器中存储有至少一条指令，所述至少一条指令由处理器加载并执行以实现如权利要求6至8之一所述的基于蒙特卡洛树搜索和强化学习的查询优化方法。This embodiment is a query optimization device based on Monte Carlo tree search and reinforcement learning, the device includes a processor and a memory, the memory stores at least one instruction, and the at least one instruction is loaded and executed by the processor to achieve The query optimization method based on Monte Carlo tree search and reinforcement learning according to one of claims 6 to 8.

本实施方式中的设备包括但不限于移动终端、PC机、服务器、工作站等。The devices in this embodiment include, but are not limited to, mobile terminals, PCs, servers, workstations, and the like.

本发明还可有其它多种实施例，在不背离本发明精神及其实质的情况下，本领域技术人员当可根据本发明作出各种相应的改变和变形，但这些相应的改变和变形都应属于本发明所附的权利要求的保护范围。The present invention can also have other various embodiments. Without departing from the spirit and essence of the present invention, those skilled in the art can make various corresponding changes and deformations according to the present invention, but these corresponding changes and deformations are all It should belong to the protection scope of the appended claims of the present invention.

Claims

1. a query optimization system based on Monte Carlo tree search and reinforcement learning, the system adopts the same framework as the NEO query optimization model, including a value model unit and a query plan search unit; it is characterized in that,

Value model unit: Based on the value model, the cost of the query plan is predicted by using the characteristics corresponding to the query plan;

The value model is a neural network model; the input of the value model is a vector tree, which is used to represent the query plan that needs to estimate the cost, the topology structure of the vector tree is a binary tree structure, and the coding of each node is in the order of the tree traversal order. Splicing; the node characteristics of the node are composed of the encoding of the node information; the node information includes the node type, the table involved in the operation, the index type involved in the operation, the predicate involved in the operation, and the cardinality estimation result of the node;

Query plan search unit: Using Monte Carlo tree search method, query plan search is performed according to the prediction of query plan -> time cost, and an execution plan is generated from the search space.

2. the query optimization system based on Monte Carlo tree search and reinforcement learning according to claim 1, is characterized in that, the coding of described node information is as follows:

The node type is one-hot encoding; the table involved in the operation is one-hot encoding; the index type involved in the operation is one-hot encoding; the predicate involved in the operation is one-hot encoding; the cardinality estimation result of the node is an integer value .

3. The query optimization system based on Monte Carlo tree search and reinforcement learning according to claim 2, is characterized in that, the neural network model as the value model comprises a stacking tree convolution unit and two batch-based normalizations connected afterward The fully connected layer; the stacked tree convolution unit is composed of multiple stacked tree convolution layers.

4. The query optimization system based on Monte Carlo tree search and reinforcement learning according to claim 1, 2 or 3, characterized in that, when the query plan search unit searches by the method of Monte Carlo tree search, it will generate The process of a better query plan is modeled as a Markov process. The Monte Carlo tree search is used to expand the sub-query plan, and the vector tree code corresponding to the semantically consistent sub-query plan is used as the tree node in the Monte Carlo tree search. Monte Carlo tree search is implemented by evaluating the cost of subquery plans using a value model when searching.

5. the query optimization system based on Monte Carlo tree search and reinforcement learning according to claim 4, is characterized in that, the process that the query plan search unit adopts the method of Monte Carlo tree search to search comprises the following steps:

(1) Selection: Starting from the root node R, use the UCT algorithm to recursively select the optimal child node until the leaf node L is reached;

(2) Extension: If L is not a termination node, then create one or more sub-nodes that do not contradict the semantics of the query, and select one of the sub-nodes C;

(3) Simulation: starting from C, randomly select a non-contradictory child node until the current query plan is completely expressed;

(4) Backpropagation: Use the value model to predict the cost of the simulated complete query plan, and reversely update the simulation times and rewards of all its parent nodes from this leaf node;

(5) Steps (1)-(4) are executed cyclically, and the sub-query plan is continuously expanded until a complete query plan is generated.

6. A query optimization method based on Monte Carlo tree search and reinforcement learning, characterized in that it comprises the following steps:

Use the value model as the prediction model of query plan -> time cost; do query plan search on the prediction model of query plan -> time cost;

The query plan search uses the Monte Carlo tree search method to search, and performs query plan search according to the query plan->time cost prediction, and generates an execution plan from the search space.

7. The query optimization method based on Monte Carlo tree search and reinforcement learning according to claim 6, is characterized in that, the training process of described value model comprises the following steps:

S1. Use the execution plan and time cost corresponding to the query as experience, and construct an experience set; use the execution plan and time cost corresponding to each experience in the experience combination as a training sample of the value model;

S2, characterize the training samples, that is, extract features, and the features include query features and execution plan features;

S3. Use the extracted features to train the value model, and the trained value model is the established query plan -> the prediction model of time cost.

8. The query optimization method based on Monte Carlo tree search and reinforcement learning according to claim 7, wherein the process of using the Monte Carlo tree search method to search comprises the following steps:

9. A query optimization device based on Monte Carlo tree search and reinforcement learning, wherein the device is a storage medium, and the storage medium stores at least one instruction, and the at least one instruction is loaded by the processor and executed. Executed to implement the Monte Carlo Tree Search and Reinforcement Learning based query optimization method as claimed in one of claims 6 to 8.

10. A query optimization device based on Monte Carlo tree search and reinforcement learning, characterized in that the device comprises a processor and a memory, the memory stores at least one instruction, and the at least one instruction is loaded and executed by the processor In order to realize the query optimization method based on Monte Carlo tree search and reinforcement learning as described in one of claims 6 to 8.