+

CN116244333B - Database query performance prediction method and system based on cost factor calibration - Google Patents

Database query performance prediction method and system based on cost factor calibration

Info

Publication number
CN116244333B
CN116244333B CN202310124785.2A CN202310124785A CN116244333B CN 116244333 B CN116244333 B CN 116244333B CN 202310124785 A CN202310124785 A CN 202310124785A CN 116244333 B CN116244333 B CN 116244333B
Authority
CN
China
Prior art keywords
cost
training data
execution plan
cost factor
query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310124785.2A
Other languages
Chinese (zh)
Other versions
CN116244333A (en
Inventor
彭朝晖
王一博
张甲康
陈倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202310124785.2A priority Critical patent/CN116244333B/en
Publication of CN116244333A publication Critical patent/CN116244333A/en
Application granted granted Critical
Publication of CN116244333B publication Critical patent/CN116244333B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of data processing, and provides a database query performance prediction method and system based on cost factor calibration, wherein the method comprises the steps of calibrating cost factors in a database query optimizer module in a statistical regression mode; extracting class characteristics of nodes of the execution plan by adopting a dictionary coding mode for the history execution plan executed in the optimizer, generating embedded characteristics for the class characteristics of the nodes of the execution plan through a characteristic processing neural network, splicing the embedded characteristics into an embedded characteristic tree according to an execution structure of the original execution plan, performing binary treeing on the embedded characteristic tree, training the convolutional neural network according to the embedded characteristic tree, and obtaining query performance by adopting the trained tree-shaped convolutional neural network for the execution plan to be processed. The more accurate and more effective query performance prediction effect is realized.

Description

Database query performance prediction method and system based on cost factor calibration
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a database query performance prediction method and system based on cost factor calibration.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
In recent years, much research effort has focused on both optimizing the traditional cost model of the database and using neural networks to learn and replace the traditional cost model. For the former, the computing power of the traditional cost model of the database can be effectively improved, the iterative update of the database optimizer is promoted, but the cost computing formula can not be obtained all the time, the existing formula can not accurately predict the query performance in many complex scenes, the focus of the existing cost formula is mostly the relative magnitude value of the cost between execution plans, no specific value unit exists, and for different hardware and execution environments, the default relative ratio between computing units is used, and the cost computing result can not reflect the execution performance of the query well. For the latter, the neural network can effectively fit the execution performance of the query, but the feature construction aspect is often low-density and sparse in information, so that the association and structure information between nodes in an execution plan cannot be well extracted by the features input by the network model, meanwhile, the knowledge of the existing cost model in the database is often ignored by the existing machine learning method, and the prediction accuracy of the query performance is greatly improved.
For the two current query performance prediction modes, the common advantages are that the used technologies are mature, but lack of fusion, and three problems of query performance prediction are not solved, namely firstly, the attention point of a cost model lacks accuracy in the aspect of query performance prediction because of comparing the relative values of planned costs, and a cost formula facing the aspect of query performance needs to be designed for supplement or correction. Secondly, many neural networks ignore partial order relations in an execution plan when constructing features, obtain sparse coding information for the execution nodes, and cannot capture correlation of the execution nodes and tree structure information of the execution plan well. Third, the existing neural network does not combine the cost model knowledge of the database to learn, and cannot effectively fuse the domain knowledge of the cost model of the database.
Disclosure of Invention
In order to solve at least one technical problem in the background art, the invention provides a database query performance prediction method and system based on cost factor calibration, which effectively fuses the domain knowledge of a cost model in a database query optimizer, establishes a feature processing neural network and an execution plan-oriented tree convolution neural network, effectively captures the relevance of execution nodes and tree structure information of an execution plan, and effectively improves the accuracy of a query performance prediction result. The method can solve key problems in query optimization and query scheduling.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
The first aspect of the invention provides a database query performance prediction method based on cost factor calibration, which comprises the following steps:
acquiring historical data of an execution plan corresponding to the regression query statement;
fitting the numerical value of the cost factor in the cost formula by adopting a statistical regression calibration method according to the historical data of the execution plan corresponding to the regression query statement to obtain a cost model of the database query optimizer;
Acquiring an original training data set, acquiring a training data feature set and a training data label set based on the original training data set, training a tree-shaped deep convolutional neural network model based on the training data feature set and the training data label set, and obtaining a deep calibration model;
and aiming at the execution plan to be predicted, combining a depth calibration model to obtain a predicted error time offset, obtaining cost time based on a database query optimizer cost model, and summing the predicted error time offset and the cost time to obtain the execution time of the query execution plan to be predicted.
A second aspect of the present invention provides a database query performance prediction system based on cost factor calibration, comprising:
The data acquisition module is used for acquiring historical data of the execution plan corresponding to the regression inquiry statement;
The cost model construction module is used for fitting the numerical value of the cost factor in the cost formula by adopting a statistical regression calibration method according to the historical data of the execution plan corresponding to the regression query statement to obtain a cost model of the database query optimizer;
the depth calibration model building module is used for acquiring an original training data set, acquiring a training data feature set and a training data label set based on the original training data set, training a tree-shaped depth convolutional neural network model based on the training data feature set and the training data label set, and acquiring a depth calibration model;
The query performance prediction module is used for obtaining a predicted error time offset by combining a depth calibration model aiming at an execution plan to be predicted, obtaining cost time based on a database query optimizer cost model, and summing the predicted error time offset and the cost time to obtain the execution time of the execution plan to be predicted.
A third aspect of the present invention provides a computer-readable storage medium.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of a method of predicting database query performance based on cost factor calibration as described above.
A fourth aspect of the invention provides a computer device.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in a cost factor calibration based database query performance prediction method as described above when the program is executed.
Compared with the prior art, the invention has the beneficial effects that:
1. the neural network prediction scheme used by the invention effectively merges the domain knowledge of the cost model in the database query optimizer, and can effectively improve the accuracy of the prediction result.
2. According to the invention, the cost factors are calibrated in a statistical regression mode, each cost factor is adjusted to be a numerical value which is suitable for the current hardware environment, and the numerical value represents the real execution time of the cost factor, so that the result calculated by the cost formula is more referential.
3. The invention adopts the characteristic processing neural network for the coding of the execution plan nodes, converts the characteristic vector with sparse information into the embedded vector with dense information, and effectively enriches the characteristic information of the execution nodes.
4. According to the invention, for the performance prediction model of the execution plan, a tree-shaped convolutional neural network facing the execution plan is established, the mutual relevance of the execution nodes and the tree-shaped structure information of the execution plan are effectively captured, the model prediction error time offset is used, and the query performance jointly predicted by the result of the joint database cost formula is more accurate.
Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.
FIG. 1 is a flowchart of an overall prediction method for predicting database query performance according to an embodiment of the present invention;
FIG. 2 is a flowchart of an overall framework for implementing a method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a neural network for feature processing according to an embodiment of the present invention;
FIG. 4 is a binary tree representation of a query execution plan in accordance with an embodiment of the present invention;
fig. 5 is a schematic diagram of a tree convolutional neural network according to an embodiment of the present invention.
Detailed Description
The invention will be further described with reference to the drawings and examples.
It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
Example 1
As shown in fig. 1-2, the present embodiment provides a method for predicting database query performance based on cost factor calibration, including the following steps:
s1, acquiring historical data of an execution plan corresponding to a regression query statement;
s2, fitting the numerical value of the cost factor in the cost formula by adopting a statistical regression calibration method according to the historical data of the execution plan corresponding to the regression query statement to obtain a cost model of the database query optimizer;
S3, acquiring an original training data set, and acquiring a training data feature set and a training data label set based on the original training data set;
and S4, aiming at the execution plan to be predicted, combining a depth calibration model to obtain a predicted error time offset, obtaining cost time based on a database query optimizer cost model, and summing the predicted error time offset and the cost time to obtain the execution time of the query execution plan to be predicted.
The step S1 of obtaining the historical data of the execution plan corresponding to the regression query statement comprises the following steps:
and inputting and operating a group of regression query sentences into the database to obtain and store historical data of execution plans corresponding to the regression query sentences, wherein the historical data comprises information such as a base number, the number of scanning blocks, execution time and the like.
In this embodiment, the regression query includes the following five queries Q1 through Q5, wherein R, T is the relationship table present in the database and A, B is the index column on the R relationship table.
Q1:SELECT*FROM R
Q2:SELECT*FROM T
Q3:SELECT COUNT(*)FROM R
Q4:SELECT*FROM R where R.A<a
Q5:SELECT*FROM R where R.B<b
The historical data of the execution plan corresponding to the regression query statement comprises real execution time T1-T5 corresponding to Q1-Q5, R table base number n R, R table occupied data page number p R, T table base number n T, T table occupied data page number p T, COUNT function processing line number n F, R.A index processed line number n RA, R.A index occupied data page number p RA, R.A < a filter operation processed line number n FA, R.B index processed line number n RB, R.B index occupied data page number p RB, R.B < b filter operation processed line number n FB.
S2, fitting the numerical value of the cost factor in the cost formula by adopting a statistical regression calibration method according to the historical data of the execution plan corresponding to the regression query statement to obtain a cost model of the database query optimizer;
And calibrating the cost model of the database query optimizer according to the real execution time of the regression query statement. And fitting the numerical value of the cost factor in the cost formula by adopting a statistical regression calibration method, and giving the actual representation meaning of the cost factor, wherein the execution time is taken as a measurement unit.
The cost formulas corresponding to the regression queries Q1 through Q5 are as follows:
Q1:Cost=pR·cs+nR·ct
Q2:Cost=pT·cs+nT·ct
Q3:Cost=pR·cs+nR·ct+nF·co
Q4:Cost=pRA·cr+nRA·ct+nRA·ci+nFA·co
Q5:Cost=pRB·cr+nRB·ct+nRB·ci+nFB·co
The cost factors of the statistical regression calibration comprise an I/O cost factor c s for scanning one data page sequentially, an I/O cost factor c r for scanning one data page randomly, a CPU cost factor c t for processing one row of data, a CPU cost factor c i for processing one index item and a CPU cost factor c o for processing each function or operator.
The cost calculation formula related to each query in the regression query group comprises cost factors of different types, the regression query with enough quantity is designed and ensured in the mode, the real execution time of each query and the operation quantity of each cost factor are executed and collected to serve as the true value result and coefficient of the cost calculation formula, and the numerical values of the different cost factors are obtained through statistical regression by different query combinations;
And carrying out Q1-Q5 cost formulas to obtain the numerical value of the cost factor through the obtained execution plan historical data, including information such as the base number, the number of scanning blocks, the execution time and the like.
S3, acquiring an original training data set, and acquiring a training data feature set and a training data label set based on the original training data set;
S3, the original training data set acquisition method comprises the following steps:
A set of SQL query workloads are input and run into the database system, which generates a corresponding execution plan for each SQL, each execution plan comprising a plurality of operations and execution times, and after each SQL execution, the obtained execution plan is stored as an original training dataset.
In S3, feature encoding processing is performed on the execution plan of the original training data set, as shown in fig. 3 to 4.
Firstly, extracting category characteristics from an execution plan node, extracting database system table data and generating a dictionary table, and carrying out dictionary coding on the execution plan node according to the dictionary table;
The category characteristics contained in the dictionary comprise category characteristics such as node types, connection modes, scanning directions, relation table names, relation table aliases, index names, aggregation strategies and the like;
The node types comprise a scanning node type, a control node type, a materialized node type and a connection node type, wherein the scanning node type comprises a Seq Scan, an Index Scan, a Bitmap Index/Heap Scan and the like, the control node type comprises an application and the like, the materialized node type comprises a Materialize, sort, group, aggregation and the like, and the connection node type comprises a close Loop Join, a Hash Join, a Merge Join and the like.
And constructing dictionary table data according to the extracted metadata information, namely database system table data, wherein the numerical value of each item in the dictionary is one-hot code, for example, if the node type information of the extracted node from the execution plan is Seq Scan, the numerical value at the corresponding position of the Seq Scan is 1, and the numerical value at the rest positions of the node types is 0.
Based on an original training data set, sequentially extracting information of nodes in each execution plan in a subsequent traversing mode, coding the nodes according to category characteristic information in a dictionary table to obtain dictionary coding characteristics, and taking the running time of the nodes as a label corresponding to the characteristics;
Training the feature processing neural network based on dictionary coding features and corresponding labels to obtain a trained feature processing neural network;
The characteristic neural network is characterized in that a first layer is an input layer, a middle 6 layer is a hidden layer, a last layer is an output layer, the first two hidden layers are RNN layers, the third hidden layer is a flattening layer, the last three hidden layers are full-connection layers, activation functions used by the hidden layers and the output layer are Relu, the first two RNN hidden layers output 35 nodes, each node dimension is 128, the number of the nodes output by the flattening layer is 4480, the number of the nodes of the last three full-connection layers is 256, 128 and 64 in sequence, the loss function is a mean square error cost function, the obtained embedded vector is used as a characteristic to process the tail end hidden layers of the neural network, and the output layer is real time for executing a planned node operation.
And processing the neural network based on the original training data set and the trained features to obtain an embedded vector of each node of the original training data set execution plan.
The acquiring the training data feature set and the training data label set based on the original training data set comprises:
Executing the embedded vector of each node of the plan based on the original training data set, combining the embedded vectors according to the tree structure of the execution plan, and performing binary trepanning on the embedded vector tree to generate a feature vector binary tree of each plan, so as to obtain a training data feature set;
and (3) making a difference between the real execution time of each execution plan of the original training data set and the cost time calculated by the cost factor to obtain an error time offset, and obtaining a training data tag set.
S3, training the tree-shaped deep convolutional neural network model based on the training data feature set and the training data label set to obtain a deep calibration model;
As shown in FIG. 5, the model structure of the depth calibration model is that the first layer is an input layer, the middle 7 layers are hidden layers, the last layer is an output layer, the first three hidden layers are convolution layers, the fourth hidden layers are pooling layers, the last three hidden layers are full-connection layers, the activation functions used by the hidden layers and the output layers are Relu, the number of nodes of the first three convolution layers is 512, 256 and 128 in sequence, the number of nodes output by the pooling layers of the hidden layers is 128, the number of nodes of the last three full-connection layers is 128, 64 and 32 in sequence, the loss function is a mean square error cost function, and the obtained output layer is the error time offset of an execution plan node.
And S4, aiming at the execution plan to be predicted, combining a depth calibration model to obtain a predicted error time offset, obtaining cost time based on a database query optimizer cost model, and summing the predicted error time offset and the cost time to obtain the execution time of the query execution plan to be predicted.
Example two
The embodiment provides a database query performance prediction system based on cost factor calibration, which comprises the following steps:
The data acquisition module is used for acquiring historical data of the execution plan corresponding to the regression inquiry statement;
The cost model construction module is used for fitting the numerical value of the cost factor in the cost formula by adopting a statistical regression calibration method according to the historical data of the execution plan corresponding to the regression query statement to obtain a cost model of the database query optimizer;
the depth calibration model building module is used for acquiring an original training data set, acquiring a training data feature set and a training data label set based on the original training data set, training a tree-shaped depth convolutional neural network model based on the training data feature set and the training data label set, and acquiring a depth calibration model;
The query performance prediction module is used for obtaining a predicted error time offset by combining a depth calibration model aiming at an execution plan to be predicted, obtaining cost time based on a database query optimizer cost model, and summing the predicted error time offset and the cost time to obtain the execution time of the execution plan to be predicted.
Example III
The present embodiment provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of a database query performance prediction method based on cost factor calibration as described above.
Example IV
The present embodiment provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps in a database query performance prediction method based on cost factor calibration as described above when the program is executed by the processor.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random-access Memory (Random Access Memory, RAM), or the like.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1.一种基于代价因子校准的数据库查询性能预测方法,其特征在于,包括如下步骤:1. A database query performance prediction method based on cost factor calibration, characterized by comprising the following steps: 获取回归查询语句相对应执行计划的历史数据;Obtain historical data of the execution plan corresponding to the regression query statement; 根据回归查询语句相对应执行计划的历史数据,采用统计回归校准的方法,拟合代价公式中代价因子的数值,得到数据库查询优化器代价模型;Based on the historical data of the execution plan corresponding to the regression query statement, the statistical regression calibration method is used to fit the value of the cost factor in the cost formula to obtain the database query optimizer cost model; 获取原始训练数据集,基于原始训练数据集获取训练数据特征集和训练数据标签集;基于训练数据特征集和训练数据标签集,对树形深度卷积神经网络模型训练,得到深度校准模型;Obtaining an original training data set, and obtaining a training data feature set and a training data label set based on the original training data set; training a tree-shaped deep convolutional neural network model based on the training data feature set and the training data label set to obtain a deep calibration model; 针对待预测的执行计划,结合深度校准模型,得到预测的误差时间偏移量,基于数据库查询优化器代价模型得到成本时间,将预测的误差时间偏移量和成本时间求和得到查询待预测执行计划的执行时间;For the execution plan to be predicted, the deep calibration model is combined to obtain the predicted error time offset. The cost time is obtained based on the database query optimizer cost model. The predicted error time offset and the cost time are summed to obtain the execution time of the query execution plan to be predicted. 其中,所述代价公式中代价因子包括顺序扫描一个数据页面的I/O代价因子、随机扫描一个数据页面的I/O代价因子、处理一行数据的CPU代价因子、处理一个索引项的CPU代价因子、处理每个函数或操作符的CPU代价因子;The cost factors in the cost formula include the I/O cost factor of sequentially scanning a data page, the I/O cost factor of randomly scanning a data page, the CPU cost factor of processing a row of data, the CPU cost factor of processing an index item, and the CPU cost factor of processing each function or operator; 其中,所述基于原始训练数据集获取训练数据特征集和训练数据标签集包括:The step of obtaining a training data feature set and a training data label set based on the original training data set includes: 基于原始训练数据集执行计划每个节点的嵌入向量,将嵌入向量按执行计划的树形结构组合,并将嵌入向量树进行二叉树化,生成每个计划的特征向量二叉树,得到训练数据特征集;Based on the embedding vector of each node of the execution plan of the original training data set, the embedding vectors are combined according to the tree structure of the execution plan, and the embedding vector tree is binarized to generate a binary tree of feature vectors for each plan to obtain the training data feature set; 将原始训练数据集每个执行计划真实执行时间与通过代价因子计算出的成本时间做差,获得误差时间偏移量,得到训练数据标签集。The actual execution time of each execution plan in the original training data set is subtracted from the cost time calculated by the cost factor to obtain the error time offset and the training data label set. 2.如权利要求1所述的一种基于代价因子校准的数据库查询性能预测方法,其特征在于,所述根据回归查询语句相对应执行计划的历史数据,采用统计回归校准的方法,拟合代价公式中代价因子的数值包括:2. The database query performance prediction method based on cost factor calibration according to claim 1, wherein the cost factor values in the cost formula are fitted using a statistical regression calibration method based on historical data of execution plans corresponding to regression query statements, and the cost factor values in the cost formula are fitted using the following methods: 基于历史数据获取所有回归查询语句所对应代价公式中的已知量;Obtain known quantities in the cost formula corresponding to all regression query statements based on historical data; 将已知量代入相应的代价公式,联立所有代价公式求解得到代价因子的数值。Substitute the known quantity into the corresponding cost formula, solve all the cost formulas together to obtain the value of the cost factor. 3.如权利要求1所述的一种基于代价因子校准的数据库查询性能预测方法,其特征在于,回归查询中每条查询所涉及的代价计算公式包含不同类型的代价因子,执行并收集每条查询的真实执行时间和每种代价因子的操作数量,作为代价计算公式的真值结果和系数,通过不同的查询组合来统计回归得到不同代价因子的数值。3. A database query performance prediction method based on cost factor calibration as described in claim 1 is characterized in that the cost calculation formula involved in each query in the regression query contains different types of cost factors, the actual execution time of each query and the number of operations of each cost factor are executed and collected, and the true value result and coefficient of the cost calculation formula are used as the true value result and coefficient of the cost calculation formula, and statistical regression is performed through different query combinations to obtain the numerical values of different cost factors. 4.如权利要求1所述的一种基于代价因子校准的数据库查询性能预测方法,其特征在于,所述原始训练数据集执行计划每个节点的嵌入向量的生成方法为:4. The database query performance prediction method based on cost factor calibration according to claim 1, wherein the embedding vector of each node in the execution plan of the original training dataset is generated by: 基于原始训练数据集,按照后续遍历的方式依次抽取每个执行计划中节点的信息,按照字典表中的类别特征信息对节点进行编码得到字典编码特征,并将节点的运行时间作为该特征所对应的标签;基于字典编码特征和所对应的标签对特征处理神经网络进行训练得到训练后的特征处理神经网络;Based on the original training data set, the node information in each execution plan is extracted in sequence through subsequent traversal. The nodes are encoded according to the category feature information in the dictionary table to obtain dictionary encoding features, and the node running time is used as the label corresponding to the feature. The feature processing neural network is trained based on the dictionary encoding features and the corresponding labels to obtain a trained feature processing neural network. 基于原始训练数据集和训练后的特征处理神经网络得到原始训练数据集执行计划每个节点的嵌入向量。Based on the original training dataset and the trained feature processing neural network, the embedding vector of each node of the execution plan of the original training dataset is obtained. 5.如权利要求3所述的一种基于代价因子校准的数据库查询性能预测方法,其特征在于,所述回归查询语句相对应执行计划的历史数据包括:基数、扫描块数和执行时间。5. A database query performance prediction method based on cost factor calibration as described in claim 3, characterized in that the historical data of the execution plan corresponding to the regression query statement includes: cardinality, number of scan blocks and execution time. 6.一种基于代价因子校准的数据库查询性能预测系统,其特征在于,包括:6. A database query performance prediction system based on cost factor calibration, characterized by comprising: 数据获取模块,用于获取回归查询语句相对应执行计划的历史数据;The data acquisition module is used to obtain historical data of the execution plan corresponding to the regression query statement; 代价模型构建模块,用于根据回归查询语句相对应执行计划的历史数据,采用统计回归校准的方法,拟合代价公式中代价因子的数值,得到数据库查询优化器代价模型;其中,所述代价公式中代价因子包括顺序扫描一个数据页面的I/O代价因子、随机扫描一个数据页面的I/O代价因子、处理一行数据的CPU代价因子、处理一个索引项的CPU代价因子、处理每个函数或操作符的CPU代价因子;A cost model construction module is used to fit the values of cost factors in a cost formula using a statistical regression calibration method based on historical data of execution plans corresponding to regression query statements to obtain a database query optimizer cost model; wherein the cost factors in the cost formula include the I/O cost factor of sequentially scanning a data page, the I/O cost factor of randomly scanning a data page, the CPU cost factor of processing a row of data, the CPU cost factor of processing an index item, and the CPU cost factor of processing each function or operator; 深度校准模型构建模块,用于获取原始训练数据集,基于原始训练数据集获取训练数据特征集和训练数据标签集;基于训练数据特征集和训练数据标签集,对树形深度卷积神经网络模型训练,得到深度校准模型;A deep calibration model construction module is used to obtain an original training data set, obtain a training data feature set and a training data label set based on the original training data set; and train a tree-shaped deep convolutional neural network model based on the training data feature set and the training data label set to obtain a deep calibration model. 其中,所述基于原始训练数据集获取训练数据特征集和训练数据标签集包括:The step of obtaining a training data feature set and a training data label set based on the original training data set includes: 基于原始训练数据集执行计划每个节点的嵌入向量,将嵌入向量按执行计划的树形结构组合,并将嵌入向量树进行二叉树化,生成每个计划的特征向量二叉树,得到训练数据特征集;Based on the embedding vector of each node of the execution plan of the original training data set, the embedding vectors are combined according to the tree structure of the execution plan, and the embedding vector tree is binarized to generate a binary tree of feature vectors for each plan to obtain the training data feature set; 将原始训练数据集每个执行计划真实执行时间与通过代价因子计算出的成本时间做差,获得误差时间偏移量,得到训练数据标签集;Subtract the actual execution time of each execution plan in the original training dataset from the cost time calculated using the cost factor to obtain the error time offset and the training data label set. 查询性能预测模块,用于针对待预测的执行计划,结合深度校准模型,得到预测的误差时间偏移量,基于数据库查询优化器代价模型得到成本时间,将预测的误差时间偏移量和成本时间求和得到查询待预测执行计划的执行时间。The query performance prediction module is used to obtain the predicted error time offset for the execution plan to be predicted in combination with the deep calibration model, obtain the cost time based on the database query optimizer cost model, and sum the predicted error time offset and the cost time to obtain the execution time of the query execution plan to be predicted. 7.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-5中任一项所述的一种基于代价因子校准的数据库查询性能预测方法中的步骤。7. A computer-readable storage medium having a computer program stored thereon, wherein when the program is executed by a processor, the program implements the steps of the database query performance prediction method based on cost factor calibration according to any one of claims 1 to 5. 8.一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如权利要求1-5中任一项所述的一种基于代价因子校准的数据库查询性能预测方法中的步骤。8. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, the processor implements the steps of a database query performance prediction method based on cost factor calibration as described in any one of claims 1 to 5.
CN202310124785.2A 2023-02-16 2023-02-16 Database query performance prediction method and system based on cost factor calibration Active CN116244333B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310124785.2A CN116244333B (en) 2023-02-16 2023-02-16 Database query performance prediction method and system based on cost factor calibration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310124785.2A CN116244333B (en) 2023-02-16 2023-02-16 Database query performance prediction method and system based on cost factor calibration

Publications (2)

Publication Number Publication Date
CN116244333A CN116244333A (en) 2023-06-09
CN116244333B true CN116244333B (en) 2025-07-29

Family

ID=86623625

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310124785.2A Active CN116244333B (en) 2023-02-16 2023-02-16 Database query performance prediction method and system based on cost factor calibration

Country Status (1)

Country Link
CN (1) CN116244333B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116821171B (en) * 2023-06-27 2024-04-19 杭州云之重器科技有限公司 Method for generating new virtual view to accelerate computing task
CN117251351B (en) * 2023-11-10 2024-04-05 支付宝(杭州)信息技术有限公司 Database performance prediction method and related equipment
CN117520385B (en) * 2024-01-05 2024-04-16 凯美瑞德(苏州)信息科技股份有限公司 Database query optimization method based on exploration value and query cost
CN118394616B (en) * 2024-04-23 2025-03-04 朴道征信有限公司 Model deployment method, device, electronic equipment and computer readable medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105243068A (en) * 2014-07-09 2016-01-13 华为技术有限公司 Database system query method, server and energy consumption test system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11645283B2 (en) * 2021-04-26 2023-05-09 International Business Machined Corporation Predictive query processing
CN115617830A (en) * 2021-07-12 2023-01-17 中国移动通信集团黑龙江有限公司 Data query optimization processing method and device based on machine learning

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105243068A (en) * 2014-07-09 2016-01-13 华为技术有限公司 Database system query method, server and energy consumption test system

Also Published As

Publication number Publication date
CN116244333A (en) 2023-06-09

Similar Documents

Publication Publication Date Title
CN116244333B (en) Database query performance prediction method and system based on cost factor calibration
CN111597209B (en) Database materialized view construction system, method and system creation method
CN112784092A (en) Cross-modal image text retrieval method of hybrid fusion model
JP2007534087A (en) Partial query caching
CN112434024A (en) Relational database-oriented data dictionary generation method, device, equipment and medium
CN107491508B (en) Database query time prediction method based on recurrent neural network
WO2021139753A1 (en) Method and apparatus for processing database
CN115062070A (en) A question-and-answer-based text table data query method
CN114564543A (en) A method for obtaining carbon footprint based on knowledge graph
CN113360670A (en) Knowledge graph completion method and system based on fact context
CN114116778B (en) A database query optimization method
CN116150190A (en) Database query optimization processing method and system based on tree QRNN
CN110750560B (en) A system and method for optimizing network multi-connection
CN116975105A (en) Data processing method and device based on rule engine and computer equipment
CN115544070A (en) A Similarity Query Optimization Method Based on Trajectory Representation Learning
CN119719424A (en) Data lineage analysis method, device, terminal equipment and computer program product
CN117390063A (en) Listwise ordering learning-based database querier optimization method
CN117113075A (en) Query optimization field cost estimation method based on graph attention network
CN110245248A (en) A Keyword Query Method of Remote Sensing Image
CN114548105A (en) An Expert Information Completion and Update Method for Siamese Graph Neural Networks
Höltgen et al. Utilizing large language models for semantic enrichment of infrastructure condition data: a comparative study of GPT and Llama models
CN112800284A (en) A Fast Query Method Based on Environment Model Database
CN119046304B (en) Method and device for generating service engine instance, computer equipment and readable storage medium
CN118963746B (en) A low-code platform operation method, system, electronic device and storage medium
CN117390064B (en) Database query optimization method based on embeddable subgraph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载