CN118410068B - Database SQL query optimization method, terminal and storage medium - Google Patents
Database SQL query optimization method, terminal and storage medium Download PDFInfo
- Publication number
- CN118410068B CN118410068B CN202410874325.6A CN202410874325A CN118410068B CN 118410068 B CN118410068 B CN 118410068B CN 202410874325 A CN202410874325 A CN 202410874325A CN 118410068 B CN118410068 B CN 118410068B
- Authority
- CN
- China
- Prior art keywords
- metadata
- query
- information
- index
- frequency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24542—Plan optimisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24552—Database cache management
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24553—Query execution of query operations
- G06F16/24558—Binary matching operations
- G06F16/2456—Join operations
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Operations Research (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域Technical Field
本发明涉及数据库技术领域,尤其涉及一种数据库SQL查询优化方法、终端及存储介质。The present invention relates to the field of database technology, and in particular to a database SQL query optimization method, a terminal and a storage medium.
背景技术Background Art
随着数据库技术的发展和数据量的激增,数据库查询优化成为了数据库管理系统中的关键技术。传统的数据库查询优化依赖于静态的、事先定义好的规则和统计信息,这在数据分布变化和查询模式多样化的情境下往往表现不佳。特别是在面对大数据和高并发查询需求时,这种静态优化策略无法充分适应数据环境的快速变化,导致查询性能和处理效率下降。With the development of database technology and the surge in data volume, database query optimization has become a key technology in database management systems. Traditional database query optimization relies on static, pre-defined rules and statistical information, which often performs poorly in situations where data distribution changes and query patterns are diverse. Especially when faced with big data and high-concurrency query requirements, this static optimization strategy cannot fully adapt to the rapid changes in the data environment, resulting in reduced query performance and processing efficiency.
发明内容Summary of the invention
本发明实施例提供了一种数据库SQL查询优化方法、终端及存储介质,以解决现有方法无法充分适应数据环境的快速变化,导致查询性能和处理效率下降的问题。The embodiments of the present invention provide a database SQL query optimization method, a terminal and a storage medium to solve the problem that the existing methods cannot fully adapt to the rapid changes in the data environment, resulting in decreased query performance and processing efficiency.
第一方面,本发明实施例提供了一种数据库SQL查询优化方法,包括:In a first aspect, an embodiment of the present invention provides a database SQL query optimization method, comprising:
获取待处理的SQL查询语句,并将SQL查询语句转换为对应的抽象语法树;Obtain the SQL query statement to be processed, and convert the SQL query statement into a corresponding abstract syntax tree;
基于元数据查询接口,获取最新的元数据信息;Based on the metadata query interface, obtain the latest metadata information;
根据最新的元数据信息和抽象语法树,生成元数据增强后的抽象语法树;Generate an abstract syntax tree with metadata enhancement according to the latest metadata information and abstract syntax tree;
基于元数据增强后的抽象语法树,进行查询动态重组和优化,得到重组后的抽象语法树;Based on the abstract syntax tree enhanced by metadata, the query is dynamically reorganized and optimized to obtain a reorganized abstract syntax tree;
基于重组后的抽象语法树,执行SQL查询语句。Execute SQL query statements based on the reorganized abstract syntax tree.
在一种可能的实现方式中,元数据查询接口为元数据分析组件提供的用于查询最新的元数据信息的接口;In a possible implementation, the metadata query interface is an interface provided by the metadata analysis component for querying the latest metadata information;
元数据分析组件通过元数据监听器以事件订阅模式监控数据库的变化,并在监控到数据库的变化信息时,以差异的方式记录数据库的变化信息,以及采用延迟加载和惰性更新策略,在数据库的变化信息实际应用于SQL查询优化时,对数据库的变化信息进行加载和更新。The metadata analysis component monitors database changes in event subscription mode through metadata listeners, and records database change information in a differential manner when monitoring database change information. It also adopts delayed loading and lazy update strategies to load and update database change information when the database change information is actually applied to SQL query optimization.
在一种可能的实现方式中,元数据分析组件根据元数据的访问频率和修改频率,动态确定内存缓存中的元数据的保留或淘汰,以及通过事务日志分析或数据库的预写日志,确保元数据变更的原子性和一致性;In a possible implementation, the metadata analysis component dynamically determines whether to retain or eliminate metadata in the memory cache according to the access frequency and modification frequency of the metadata, and ensures the atomicity and consistency of metadata changes through transaction log analysis or a database write-ahead log;
其中,元数据分析组件在内存缓存中保留访问频率高于第一预设访问频率和/或修改频率低于第一预设修改频率的元数据,淘汰访问频率低于第二预设访问频率和/或修改频率高于第二预设修改频率的元数据;第一预设访问频率大于或等于第二预设访问频率,第一预设修改频率小于或等于第二预设修改频率。Among them, the metadata analysis component retains metadata with an access frequency higher than a first preset access frequency and/or a modification frequency lower than the first preset modification frequency in the memory cache, and eliminates metadata with an access frequency lower than a second preset access frequency and/or a modification frequency higher than the second preset modification frequency; the first preset access frequency is greater than or equal to the second preset access frequency, and the first preset modification frequency is less than or equal to the second preset modification frequency.
在一种可能的实现方式中,基于元数据增强后的抽象语法树,进行查询动态重组和优化,得到重组后的抽象语法树,包括:In a possible implementation, based on the abstract syntax tree enhanced by the metadata, the query is dynamically reorganized and optimized to obtain a reorganized abstract syntax tree, including:
若存在子查询或视图,则根据最新的元数据信息确定子查询或视图所依赖的目标表,在目标表的访问频率高于第三预设访问频率、目标表的索引符合预设高效索引条件、目标表的数据量小于预设数据量、目标表的数据变更频率低于第一预设变更频率、或者执行子查询或视图后的结果集大小小于预设大小时,将子查询或视图进行展开操作;If a subquery or view exists, the target table on which the subquery or view depends is determined according to the latest metadata information, and the subquery or view is expanded when the access frequency of the target table is higher than a third preset access frequency, the index of the target table meets the preset efficient index condition, the data volume of the target table is less than the preset data volume, the data change frequency of the target table is lower than the first preset change frequency, or the result set size after executing the subquery or view is less than the preset size;
若存在聚合操作,则根据最新的元数据信息中的统计信息,在聚合操作前提前过滤掉非必要数据,减少聚合操作的数据输入量;If there is an aggregation operation, then based on the statistical information in the latest metadata information, unnecessary data is filtered out in advance before the aggregation operation to reduce the amount of data input for the aggregation operation;
若存在排序操作,则根据最新的元数据信息确定待排序列是否存在索引、待排序列的数据量和待排序列的数据分布信息,并根据待排序列是否存在索引、待排序列的数据量和待排序列的数据分布信息,确定排序策略;If there is a sorting operation, determine whether there is an index for the sequence to be sorted, the amount of data in the sequence to be sorted, and the data distribution information of the sequence to be sorted according to the latest metadata information, and determine the sorting strategy according to whether there is an index for the sequence to be sorted, the amount of data in the sequence to be sorted, and the data distribution information of the sequence to be sorted;
若数据库支持列式存储,则根据最新的元数据信息调整数据库中的数据的存储方式。If the database supports column storage, adjust the storage method of the data in the database according to the latest metadata information.
在一种可能的实现方式中,基于元数据增强后的抽象语法树,进行查询动态重组和优化,得到重组后的抽象语法树,还包括:In a possible implementation, based on the abstract syntax tree enhanced by the metadata, query dynamic reorganization and optimization are performed to obtain a reorganized abstract syntax tree, which also includes:
若存在WHERE子句,则按照最新的元数据信息中的索引选择性由高到低的顺序,重新排序谓词;If there is a WHERE clause, reorder the predicates in descending order of index selectivity in the latest metadata information;
若存在多表连接,则根据最新的元数据信息中的表的大小信息、索引覆盖信息和/或表中数据的分布特征,调整多表连接的顺序和策略;If there are multiple table connections, adjust the order and strategy of the multiple table connections based on the table size information, index coverage information and/or distribution characteristics of the data in the table in the latest metadata information;
根据最新的元数据信息,确定查询过程中是否使用索引扫描,以及对于复合索引,分析查询条件中的列与复合索引中的索引列的匹配程度,确定复合索引中最有效的索引利用策略;Based on the latest metadata information, determine whether to use index scanning during the query process, and for composite indexes, analyze the matching degree between the columns in the query conditions and the index columns in the composite index, and determine the most effective index utilization strategy in the composite index;
根据最新的元数据信息中的分区信息,针对与查询条件相关的分区执行查询操作。According to the partition information in the latest metadata information, the query operation is performed on the partitions related to the query condition.
在一种可能的实现方式中,基于重组后的抽象语法树,执行SQL查询语句,包括:In a possible implementation, executing a SQL query statement based on the reorganized abstract syntax tree includes:
基于重组后的抽象语法树,生成多个执行计划;Generate multiple execution plans based on the reorganized abstract syntax tree;
从多个执行计划中,选取执行成本最低的目标执行计划,并按照目标执行计划,执行SQL查询语句;Select the target execution plan with the lowest execution cost from multiple execution plans, and execute the SQL query statement according to the target execution plan;
在执行SQL查询语句的过程中,获取实时性能指标;Get real-time performance indicators during the execution of SQL query statements;
若实时性能指标不符合预期目标,则生成新的执行计划,并根据新的执行计划,执行SQL查询语句;If the real-time performance indicators do not meet the expected targets, a new execution plan is generated, and the SQL query statement is executed according to the new execution plan;
获取SQL查询语句的实际执行结果,并根据实际执行结果更新元数据信息。Get the actual execution result of the SQL query statement and update the metadata information according to the actual execution result.
在一种可能的实现方式中,在基于重组后的抽象语法树,执行SQL查询语句之后,数据库SQL查询优化方法还包括:In a possible implementation, after executing the SQL query statement based on the reorganized abstract syntax tree, the database SQL query optimization method further includes:
基于查询日志和最新的元数据信息,确定SQL查询语句的执行频率和SQL查询语句的执行结果的数据变更频率;Based on the query log and the latest metadata information, determine the execution frequency of SQL query statements and the data change frequency of the execution results of SQL query statements;
若SQL查询语句的执行频率高于预设执行频率和/或SQL查询语句的执行结果的数据变更频率低于第二预设变更频率,则将SQL查询语句的执行结果存储在查询缓存中。If the execution frequency of the SQL query statement is higher than the preset execution frequency and/or the data change frequency of the execution result of the SQL query statement is lower than the second preset change frequency, the execution result of the SQL query statement is stored in the query cache.
在一种可能的实现方式中,查询缓存的有效期与元数据的数据变更模式关联。In a possible implementation, the validity period of the query cache is associated with the data change pattern of the metadata.
第二方面,本发明实施例提供了一种数据库SQL查询优化装置,包括:In a second aspect, an embodiment of the present invention provides a database SQL query optimization device, comprising:
第一获取模块,用于获取待处理的SQL查询语句,并将SQL查询语句转换为对应的抽象语法树;A first acquisition module is used to acquire the SQL query statement to be processed and convert the SQL query statement into a corresponding abstract syntax tree;
第二获取模块,用于基于元数据查询接口,获取最新的元数据信息;The second acquisition module is used to acquire the latest metadata information based on the metadata query interface;
生成模块,用于根据最新的元数据信息和抽象语法树,生成元数据增强后的抽象语法树;A generation module, used for generating an abstract syntax tree after metadata enhancement according to the latest metadata information and the abstract syntax tree;
查询优化模块,用于基于元数据增强后的抽象语法树,进行查询动态重组和优化,得到重组后的抽象语法树;A query optimization module is used to dynamically reorganize and optimize queries based on the abstract syntax tree enhanced by metadata to obtain a reorganized abstract syntax tree;
执行模块,用于基于重组后的抽象语法树,执行SQL查询语句。The execution module is used to execute SQL query statements based on the reorganized abstract syntax tree.
第三方面,本发明实施例提供了一种终端,包括存储器和处理器,存储器用于存储计算机程序,处理器用于调用并运行存储器中存储的计算机程序,执行如第一方面或第一方面的任一种可能的实现方式所述的数据库SQL查询优化方法。In a third aspect, an embodiment of the present invention provides a terminal, comprising a memory and a processor, the memory being used to store a computer program, the processor being used to call and run the computer program stored in the memory, and executing the database SQL query optimization method as described in the first aspect or any possible implementation of the first aspect.
第四方面,本发明实施例提供了一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,计算机程序被处理器执行时实现如上第一方面或第一方面的任一种可能的实现方式所述的数据库SQL查询优化方法的步骤。In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, which stores a computer program. When the computer program is executed by a processor, it implements the steps of the database SQL query optimization method as described in the first aspect or any possible implementation method of the first aspect.
本发明实施例提供一种数据库SQL查询优化方法、终端及存储介质,该方法在对待处理的SQL查询语句进行处理时,通过元数据查询接口,获取数据库最新的元数据信息,通过最新的元数据信息可以生成SQL查询语句对应的元数据增强后的抽象语法树,通过元数据增强后的抽象语法树,进行查询动态重组和优化,可以基于最新的元数据信息,智能地、动态地调整和优化SQL查询语句,可以适应不断变化的数据环境,进而可以提高查询性能和处理效率。The embodiment of the present invention provides a database SQL query optimization method, a terminal and a storage medium. When processing an SQL query statement to be processed, the method obtains the latest metadata information of the database through a metadata query interface, generates a metadata-enhanced abstract syntax tree corresponding to the SQL query statement through the latest metadata information, performs dynamic query reorganization and optimization through the metadata-enhanced abstract syntax tree, intelligently and dynamically adjusts and optimizes the SQL query statement based on the latest metadata information, adapts to a constantly changing data environment, and further improves query performance and processing efficiency.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative labor.
图1是本发明实施例提供的数据库SQL查询优化方法的流程示意图;1 is a schematic diagram of a flow chart of a database SQL query optimization method provided by an embodiment of the present invention;
图2是本发明实施例提供的数据库SQL查询优化装置的结构示意图;2 is a schematic diagram of the structure of a database SQL query optimization device provided by an embodiment of the present invention;
图3是本发明实施例提供的终端的示意图。FIG. 3 is a schematic diagram of a terminal provided by an embodiment of the present invention.
具体实施方式DETAILED DESCRIPTION
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本发明实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本发明。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本发明的描述。In the following description, specific details such as specific system structures, technologies, etc. are provided for the purpose of illustration rather than limitation, so as to provide a thorough understanding of the embodiments of the present invention. However, it should be clear to those skilled in the art that the present invention may be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, devices, circuits, and methods are omitted to prevent unnecessary details from obstructing the description of the present invention.
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图通过具体实施例来进行说明。In order to make the purpose, technical solutions and advantages of the present invention more clear, specific embodiments will be described below in conjunction with the accompanying drawings.
参见图1,其示出了本发明实施例提供的数据库SQL查询优化方法的实现流程图。上述数据库SQL查询优化方法的执行主体可以是终端。Referring to Fig. 1, it shows a flowchart of the implementation of the database SQL query optimization method provided by an embodiment of the present invention. The execution subject of the above database SQL query optimization method may be a terminal.
上述数据库SQL查询优化方法可以包括:The above database SQL query optimization method may include:
在S101中,获取待处理的SQL(Structured Query Language,结构化查询语言)查询语句,并将SQL查询语句转换为对应的抽象语法树(Abstract Syntax Tree,AST)。In S101, a SQL (Structured Query Language) query statement to be processed is obtained, and the SQL query statement is converted into a corresponding Abstract Syntax Tree (AST).
当获取到待处理的SQL查询语句后,通过数据库内部的解析器完成预处理和解析,将其转换为数据库能理解的内部AST。After the SQL query statement to be processed is obtained, it is preprocessed and parsed through the parser inside the database, and converted into an internal AST that the database can understand.
其中,S101至S105可以通过数据库中的查询优化器执行。S101 to S105 may be executed by a query optimizer in the database.
在S102中,基于元数据查询接口,获取最新的元数据信息。In S102, the latest metadata information is obtained based on the metadata query interface.
其中,最新的元数据信息也可以理解为实时的元数据信息,可以用于丰富解析后的查询表示,即AST,为之后的查询重组与优化提供必要的数据支持。Among them, the latest metadata information can also be understood as real-time metadata information, which can be used to enrich the parsed query representation, namely AST, to provide necessary data support for subsequent query reorganization and optimization.
数据库的元数据分析组件会对外提供元数据查询接口,通过该元数据查询接口,可以获取数据库最新的元数据信息。The metadata analysis component of the database will provide a metadata query interface to the outside world, through which the latest metadata information of the database can be obtained.
在S103中,根据最新的元数据信息和抽象语法树,生成元数据增强后的抽象语法树。In S103, an abstract syntax tree with metadata enhancement is generated according to the latest metadata information and the abstract syntax tree.
元数据增强后的抽象语法树为融合了最新的元数据信息的抽象语法树。最新的元数据信息可以支持查询优化器更加智能地重组和优化SQL查询语句。The metadata-enhanced abstract syntax tree is an abstract syntax tree that incorporates the latest metadata information. The latest metadata information can support the query optimizer to reorganize and optimize SQL query statements more intelligently.
在S104中,基于元数据增强后的抽象语法树,进行查询动态重组和优化,得到重组后的抽象语法树。In S104, based on the abstract syntax tree enhanced by the metadata, the query is dynamically reorganized and optimized to obtain a reorganized abstract syntax tree.
本申请可以基于元数据增强后的抽象语法树,利用元数据增强后的抽象语法树中融合的最新的元数据信息(比如,索引使用情况和统计信息等等),对SQL查询进行动态重组(或重写)和优化,从而可以得到重组后的抽象语法树。This application can dynamically reorganize (or rewrite) and optimize SQL queries based on the metadata-enhanced abstract syntax tree, using the latest metadata information (such as index usage and statistical information, etc.) integrated into the metadata-enhanced abstract syntax tree, so as to obtain a reorganized abstract syntax tree.
重组后的抽象语法树与重组前的抽象语法树相比,执行效率更高,查询性能更优。Compared with the abstract syntax tree before reorganization, the reorganized abstract syntax tree has higher execution efficiency and better query performance.
在S105中,基于重组后的抽象语法树,执行SQL查询语句。In S105 , the SQL query statement is executed based on the reorganized abstract syntax tree.
本申请实施例在对待处理的SQL查询语句进行处理时,通过元数据查询接口,获取数据库最新的元数据信息,通过最新的元数据信息可以生成SQL查询语句对应的元数据增强后的抽象语法树,通过元数据增强后的抽象语法树,进行查询动态重组和优化,可以基于最新的元数据信息,智能地、动态地调整和优化SQL查询语句,可以适应不断变化的数据环境,进而可以提高查询性能和处理效率。When processing an SQL query statement to be processed, the embodiment of the present application obtains the latest metadata information of the database through a metadata query interface, and generates a metadata-enhanced abstract syntax tree corresponding to the SQL query statement through the latest metadata information. The query is dynamically reorganized and optimized through the metadata-enhanced abstract syntax tree, and the SQL query statement can be intelligently and dynamically adjusted and optimized based on the latest metadata information, so as to adapt to the ever-changing data environment, thereby improving query performance and processing efficiency.
在一些实施例中,元数据查询接口为元数据分析组件提供的用于查询最新的元数据信息的接口;In some embodiments, the metadata query interface is an interface provided by the metadata analysis component for querying the latest metadata information;
元数据分析组件通过元数据监听器以事件订阅模式监控数据库的变化,并在监控到数据库的变化信息时,以差异的方式记录数据库的变化信息,以及采用延迟加载和惰性更新策略,在数据库的变化信息实际应用于SQL查询优化时,对数据库的变化信息进行加载和更新。The metadata analysis component monitors database changes in event subscription mode through metadata listeners, and records database change information in a differential manner when monitoring database change information. It also adopts delayed loading and lazy update strategies to load and update database change information when the database change information is actually applied to SQL query optimization.
元数据分析组件为数据库中的一个组件,用于对数据库中的元数据进行分析。其提供元数据查询接口,允许数据库中的查询优化器和其它组件访问最新的元数据信息。元数据查询接口可以响应元数据的查询请求,返回最新的、增量更新后的元数据信息。The metadata analysis component is a component in the database that analyzes the metadata in the database. It provides a metadata query interface that allows the query optimizer and other components in the database to access the latest metadata information. The metadata query interface can respond to metadata query requests and return the latest, incrementally updated metadata information.
元数据监听器嵌入在数据库管理系统中。通过元数据监听器,使用事件订阅模式,可以监控数据库的变化,包括数据库模式的变化,具体可以包括表、视图、索引、存储过程和函数的创建、更新或删除,等等。The metadata listener is embedded in the database management system. Through the metadata listener, using the event subscription mode, you can monitor the changes in the database, including changes in the database schema, which can include the creation, update or deletion of tables, views, indexes, stored procedures and functions, etc.
在实际应用中,当DDL(Data Definition Language,数据定义语言)语句执行时,元数据监听器捕获变更事件,并解析变更事件内容,确定变更的性质。示例性地,若是ALTERTABLE操作,元数据监听器将识别哪些列被添加、删除或更改了数据类型,等等。In actual applications, when a DDL (Data Definition Language) statement is executed, the metadata listener captures the change event and parses the change event content to determine the nature of the change. For example, if it is an ALTER TABLE operation, the metadata listener will identify which columns are added, deleted, or the data type is changed, etc.
在捕捉到数据库的变化信息后,元数据监听器可以以差异的形式将数据库的变化信息进行记录,完成同步过程。其中,同步过程仅记录数据库的变化信息,而不是替换已有的全部元数据,从而可以只同步数据库被更改的部分,提高同步效率。After capturing the database change information, the metadata listener can record the database change information in the form of differences to complete the synchronization process. The synchronization process only records the database change information instead of replacing all existing metadata, so that only the changed part of the database can be synchronized, improving synchronization efficiency.
对于元数据的更新,可以采用延迟加载和惰性更新策略,并不立即更新数据库的变化信息,以减少对数据库性能的影响。可以在数据库的变化信息作为元数据被实际用于SQL查询优化时,才对数据库的变化信息进行加载和更新操作。示例性地,可以在查询优化器通过元数据查询接口,访问最新的元数据信息时,对数据库的变化信息进行加载和更新操作,以便于查询优化器可以获取最新的元数据信息。For metadata updates, a delayed loading and lazy update strategy can be adopted, and the database change information is not updated immediately to reduce the impact on database performance. The database change information can be loaded and updated only when the database change information is actually used as metadata for SQL query optimization. Exemplarily, when the query optimizer accesses the latest metadata information through the metadata query interface, the database change information can be loaded and updated so that the query optimizer can obtain the latest metadata information.
在一些实施例中,元数据分析组件根据元数据的访问频率和修改频率,动态确定内存缓存中的元数据的保留或淘汰,以及通过事务日志分析或数据库的预写日志(Write-Ahead Logging,WAL),确保元数据变更的原子性和一致性;In some embodiments, the metadata analysis component dynamically determines whether to retain or eliminate metadata in the memory cache based on the access frequency and modification frequency of the metadata, and ensures the atomicity and consistency of metadata changes through transaction log analysis or database write-ahead logging (WAL);
其中,元数据分析组件在内存缓存中保留访问频率高于第一预设访问频率和/或修改频率低于第一预设修改频率的元数据,淘汰访问频率低于第二预设访问频率和/或修改频率高于第二预设修改频率的元数据;第一预设访问频率大于或等于第二预设访问频率,第一预设修改频率小于或等于第二预设修改频率。Among them, the metadata analysis component retains metadata with an access frequency higher than a first preset access frequency and/or a modification frequency lower than the first preset modification frequency in the memory cache, and eliminates metadata with an access frequency lower than a second preset access frequency and/or a modification frequency higher than the second preset modification frequency; the first preset access frequency is greater than or equal to the second preset access frequency, and the first preset modification frequency is less than or equal to the second preset modification frequency.
在本实施例中,综合访问频率和修改频率,高效的元数据的缓存机制为保留高访问频率(重要)和/或低修改频率(稳定)的元数据,淘汰低访问频率(不重要或不常用)和/或高修改频率(不稳定,频繁变化)的元数据。该元数据的缓存机制有助于优化缓存的利用效率,确保数据库性能的最优化。另外,这种缓存机制这不仅可以减少对存储的需求,而且可以提高访问速度。In this embodiment, the efficient metadata cache mechanism is to retain metadata with high access frequency (important) and/or low modification frequency (stable) by comprehensively considering access frequency and modification frequency, and eliminate metadata with low access frequency (unimportant or infrequently used) and/or high modification frequency (unstable, frequently changing). The metadata cache mechanism helps to optimize the utilization efficiency of the cache and ensure the optimization of database performance. In addition, this cache mechanism can not only reduce the demand for storage, but also improve the access speed.
如果某部分元数据被频繁访问,例如,经常被查询优化器用于优化查询,表明该部分元数据对于数据库操作非常重要。因此,该部分元数据应被保留在缓存中。如果某部分元数据不经常发生变更(即修改频率低),它们的保留价值增加,因为它们提供了稳定且长期有效的信息,不需要频繁更新。If some metadata is frequently accessed, for example, it is often used by the query optimizer to optimize queries, it indicates that this metadata is very important for database operations. Therefore, this metadata should be retained in the cache. If some metadata does not change frequently (i.e., the modification frequency is low), their retention value increases because they provide stable and long-term valid information and do not need to be updated frequently.
相对地,如果某部分元数据很少被访问,它们可能对当前的数据库操作不重要,可以考虑从缓存中淘汰以释放资源。如果某部分元数据经常发生变更,那么它在缓存中的价值可能较低,因为存储的信息很快就会过时,在这种情况下,频繁更新的元数据可能不适合长期保留在缓存中。On the other hand, if some metadata is rarely accessed, it may not be important for the current database operation and can be considered to be eliminated from the cache to free up resources. If some metadata changes frequently, its value in the cache may be low because the stored information will become outdated quickly. In this case, frequently updated metadata may not be suitable for long-term retention in the cache.
示例性地,本实施例可以将访问频率高于第一预设访问频率的元数据称为高访问频率的元数据,将修改频率低于第一预设修改频率的元数据称为低修改频率的元数据,将访问频率低于第二预设访问频率的元数据称为低访问频率的元数据,将修改频率高于第二预设修改频率的元数据称为高修改频率的元数据。Exemplarily, in this embodiment, metadata with an access frequency higher than a first preset access frequency may be referred to as metadata with a high access frequency, metadata with a modification frequency lower than the first preset modification frequency may be referred to as metadata with a low modification frequency, metadata with an access frequency lower than a second preset access frequency may be referred to as metadata with a low access frequency, and metadata with a modification frequency higher than the second preset modification frequency may be referred to as metadata with a high modification frequency.
其中,第一预设访问频率、第二预设访问频率、第一预设修改频率和第二预设修改频率可以根据数据库的实际工作负载、规模和特定需求等而有所不同。在实际应用中,这些参数是可配置的,以适应不同的环境和性能要求。The first preset access frequency, the second preset access frequency, the first preset modification frequency, and the second preset modification frequency may vary according to the actual workload, scale, and specific requirements of the database. In actual applications, these parameters are configurable to adapt to different environments and performance requirements.
访问频率可以理解为某个时长内的访问次数。在一定时间窗口(比如,每天或每小时)内,元数据被访问的次数超过某个阈值,即访问频率高于第一预设访问频率,可以认为是高访问频率的元数据。与高访问频率相反,低访问频率可能意味着在相同的时间窗口内,访问次数低于某个阈值,即访问频率低于第二预设访问频率。示例性地,若某个元数据每小时被访问的次数超过100次,可以认为是高访问频率的元数据,若某个元数据每天被访问的次数小于10次,可以认为是低访问频率的元数据,等等。Access frequency can be understood as the number of accesses within a certain period of time. Within a certain time window (for example, every day or every hour), if the number of times metadata is accessed exceeds a certain threshold, that is, the access frequency is higher than the first preset access frequency, it can be considered as metadata with a high access frequency. In contrast to high access frequency, low access frequency may mean that within the same time window, the number of accesses is lower than a certain threshold, that is, the access frequency is lower than the second preset access frequency. For example, if a certain metadata is accessed more than 100 times per hour, it can be considered as metadata with a high access frequency. If a certain metadata is accessed less than 10 times per day, it can be considered as metadata with a low access frequency, and so on.
修改频率可以理解为某个时长内的修改次数。在一定时间窗口(比如,每周、每月或每季度)内,元数据被修改的次数超过某个阈值,即修改频率高于第二预设修改频率,可以认为是高修改频率的元数据。与高修改频率相反,低修改频率可能意味着在相同的时间窗口内,修改次数低于某个阈值,或在更长的时间窗口内,很少发生修改,即修改频率低于第一预设修改频率。示例性地,若表结构的元数据每周修改3次或以上,可以认为是高修改频率,若索引的元数据在六个月内只更改了一次,可以认为是低修改频率,等等。Modification frequency can be understood as the number of modifications within a certain period of time. Within a certain time window (for example, weekly, monthly or quarterly), the number of times the metadata is modified exceeds a certain threshold, that is, the modification frequency is higher than the second preset modification frequency, which can be considered as metadata with a high modification frequency. In contrast to high modification frequency, low modification frequency may mean that within the same time window, the number of modifications is lower than a certain threshold, or within a longer time window, modifications rarely occur, that is, the modification frequency is lower than the first preset modification frequency. For example, if the metadata of the table structure is modified 3 times or more per week, it can be considered as a high modification frequency. If the metadata of the index is only changed once within six months, it can be considered as a low modification frequency, and so on.
元数据分析组件为了保证数据一致性,通过实施事务日志分析或使用数据库的写入前日志(WAL),保证元数据变更的原子性和一致性,确保元数据的更新不会中断正在进行的查询优化过程。To ensure data consistency, the metadata analysis component implements transaction log analysis or uses the database's write-ahead log (WAL) to ensure the atomicity and consistency of metadata changes and ensure that metadata updates do not interrupt the ongoing query optimization process.
元数据分析组件是整个数据库管理系统的基础,可以提供实时、准确的元数据信息,对于后续的查询优化提供重要的数据支持。本申请实施例通过有效地跟踪和同步数据库模式的变化,系统能够快速响应数据结构的变更,确保查询优化策略始终基于最新的数据库状态,不仅可以提高查询效率,还可以降低由于过时的元数据导致的潜在错误风险。The metadata analysis component is the foundation of the entire database management system, which can provide real-time and accurate metadata information and provide important data support for subsequent query optimization. By effectively tracking and synchronizing changes in the database schema, the embodiment of the present application can quickly respond to changes in the data structure and ensure that the query optimization strategy is always based on the latest database status, which can not only improve query efficiency, but also reduce the potential risk of errors caused by outdated metadata.
在一些可能的实现方式中,上述最新的元数据信息可以包括最新的表信息、索引信息和统计信息等等,具体可以包括最新的表的结构信息、表的大小信息(表的行数或表占用的物理空间大小等信息)、表统计信息、表关系信息、索引覆盖信息(索引的存在性、索引的类型、索引的属性以及索引选择性等信息)、数据的分布特征(数据是否均匀分布或集中分布、数据的分区情况、数据中是否有频繁出现的值以及数据是否存在长尾分布等信息)、索引的更新频率、索引的效率、分区信息(分区范围、分区大小和分区键等信息)、表和列的访问频率、表和列的数据变更频率、列值的分布、列值的频率、列值的基数、数据的时间戳或日期、列的访问次数、列的访问模式以及查询模式等信息中的一种或多种。In some possible implementations, the latest metadata information may include the latest table information, index information, and statistical information, etc. Specifically, it may include the latest table structure information, table size information (such as the number of rows in the table or the size of the physical space occupied by the table), table statistical information, table relationship information, index coverage information (such as the existence of the index, the type of index, the attributes of the index, and the index selectivity), data distribution characteristics (such as whether the data is evenly distributed or concentrated, the partitioning of the data, whether there are frequently occurring values in the data, and whether the data has a long-tail distribution), index update frequency, index efficiency, partition information (such as partition range, partition size, and partition key), table and column access frequency, table and column data change frequency, column value distribution, column value frequency, column value cardinality, data timestamp or date, column access count, column access mode, and query mode, etc., or one or more of the following information.
需要说明的是,上述仅给出了元数据信息的部分示例,元数据信息还可以包括其它相关的元数据,在此不做具体限制。It should be noted that the above only provides some examples of metadata information, and the metadata information may also include other related metadata, which is not specifically limited here.
在一些实施例中,上述S104可以包括:In some embodiments, the above S104 may include:
若存在子查询或视图,则根据最新的元数据信息确定子查询或视图所依赖的目标表,在目标表的访问频率高于第三预设访问频率、目标表的索引符合预设高效索引条件、目标表的数据量小于预设数据量、目标表的数据变更频率低于第一预设变更频率、或者执行子查询或视图后的结果集大小小于预设大小时,将子查询或视图进行展开操作;If a subquery or view exists, the target table on which the subquery or view depends is determined according to the latest metadata information, and the subquery or view is expanded when the access frequency of the target table is higher than a third preset access frequency, the index of the target table meets the preset efficient index condition, the data volume of the target table is less than the preset data volume, the data change frequency of the target table is lower than the first preset change frequency, or the result set size after executing the subquery or view is less than the preset size;
若存在聚合操作,则根据最新的元数据信息中的统计信息,在聚合操作前提前过滤掉非必要数据,减少聚合操作的数据输入量;If there is an aggregation operation, then based on the statistical information in the latest metadata information, unnecessary data is filtered out in advance before the aggregation operation to reduce the amount of data input for the aggregation operation;
若存在排序操作,则根据最新的元数据信息确定待排序列是否存在索引、待排序列的数据量和待排序列的数据分布信息,并根据待排序列是否存在索引、待排序列的数据量和待排序列的数据分布信息,确定排序策略;If there is a sorting operation, determine whether there is an index for the sequence to be sorted, the amount of data in the sequence to be sorted, and the data distribution information of the sequence to be sorted according to the latest metadata information, and determine the sorting strategy according to whether there is an index for the sequence to be sorted, the amount of data in the sequence to be sorted, and the data distribution information of the sequence to be sorted;
若数据库支持列式存储,则根据最新的元数据信息调整数据库中的数据的存储方式。If the database supports column storage, adjust the storage method of the data in the database according to the latest metadata information.
在本实施例中,若元数据增强后的AST存在子查询或视图,即为包含子查询或视图的复杂查询,基于最新的元数据信息来确定是否将子查询或视图“展开”(inline),可以简化查询并提高效率。In this embodiment, if there are subqueries or views in the metadata-enhanced AST, that is, a complex query containing subqueries or views, determining whether to "expand" (inline) the subquery or view is based on the latest metadata information, which can simplify the query and improve efficiency.
基于最新的元数据信息确定子查询或视图所依赖的目标表和/或目标列,并确定目标表和/或目标列的访问频率,若目标表和/或目标列在数据库中经常被访问,访问频率高(即,访问频率高于第三预设访问频率),则展开子查询或视图可以减少重复的数据访问和处理,从而提高查询效率。其中,第三预设访问频率可以根据实际需求进行设置,在此不做具体限制。Based on the latest metadata information, the target table and/or target column that the subquery or view depends on is determined, and the access frequency of the target table and/or target column is determined. If the target table and/or target column is frequently accessed in the database and has a high access frequency (i.e., the access frequency is higher than the third preset access frequency), then expanding the subquery or view can reduce repeated data access and processing, thereby improving query efficiency. The third preset access frequency can be set according to actual needs and is not specifically limited here.
基于最新的元数据信息确定目标表和/目标列是否有索引,若目标表和/或目标列存在索引,且为高效索引(即符合预设高效索引条件),则展开子查询或视图可以提高索引的利用率,加速数据访问。其中,预设高效索引条件可以包括索引选择性高于预设选择性值,且索引的更新频率低于预设更新频率,预设选择性值和预设更新频率可以根据实际需求进行设置,在此不做具体限制。Determine whether the target table and/or target column has an index based on the latest metadata information. If the target table and/or target column has an index and it is an efficient index (i.e., it meets the preset efficient index condition), then expanding the subquery or view can improve the utilization of the index and accelerate data access. Among them, the preset efficient index condition may include that the index selectivity is higher than the preset selectivity value, and the index update frequency is lower than the preset update frequency. The preset selectivity value and the preset update frequency can be set according to actual needs and are not specifically limited here.
基于最新的元数据信息可以确定目标表的数据量,目标表的数据量可以用目标表的大小信息表示。若目标表的数据量较小时(即目标表的数据量小于预设数据量),将子查询或视图展开可以直接减少处理的数据量,提高查询效率。其中,预设数据量可以根据实际需求进行设置,在此不做具体限制。The data volume of the target table can be determined based on the latest metadata information, and the data volume of the target table can be represented by the size information of the target table. If the data volume of the target table is small (that is, the data volume of the target table is less than the preset data volume), expanding the subquery or view can directly reduce the amount of data to be processed and improve query efficiency. The preset data volume can be set according to actual needs and is not specifically limited here.
基于最新的元数据信息可以确定目标表的数据变更频率,数据稳定的表适合展开操作,因为展开后的查询不需要频繁更新,有利于保持查询性能。而如果表的数据经常变更,展开操作可能不是最佳选择,因为它可能导致频繁的重新计算和更新。因此,若目标表的数据变更频率低于第一预设变更频率,即目标表的数据变更频率较低,则可以对子查询或视图进行展开操作。其中,第一预设变更频率可以根据实际使用需求进行设置,在此不做具体限制。Based on the latest metadata information, the data change frequency of the target table can be determined. Tables with stable data are suitable for expansion operations because the expanded query does not need to be updated frequently, which is conducive to maintaining query performance. If the data of the table changes frequently, the expansion operation may not be the best choice because it may lead to frequent recalculation and updating. Therefore, if the data change frequency of the target table is lower than the first preset change frequency, that is, the data change frequency of the target table is low, the subquery or view can be expanded. Among them, the first preset change frequency can be set according to actual usage requirements, and no specific restrictions are made here.
基于目标表的大小信息和查询条件,可以预估执行子查询或视图后的结果集的大小,结果集较小的子查询或视图更适合展开。因此,可以在执行子查询或视图后的结果集大小小于预设大小时,即结果集大小较小时,将子查询或视图进行展开操作。其中,预设大小可以根据实际使用需求进行设置,在此不做具体限制。Based on the size information of the target table and the query conditions, the size of the result set after executing the subquery or view can be estimated, and the subquery or view with a smaller result set is more suitable for expansion. Therefore, when the size of the result set after executing the subquery or view is less than the preset size, that is, when the result set size is small, the subquery or view can be expanded. The preset size can be set according to actual usage requirements and is not specifically limited here.
综上所述,对于展开后能够直接减少数据处理量、提高索引利用率,且不会引入新的复杂性或不必要的计算的子查询或视图,可以对其进行展开操作。In summary, the expansion operation can be performed on subqueries or views that can directly reduce the amount of data processing and improve index utilization without introducing new complexity or unnecessary calculations after expansion.
若元数据增强后的AST存在聚合操作,则可以根据最新的元数据信息中的统计信息优化聚合操作,在执行聚合操作前提前过滤掉非必要的数据,减少聚合操作的数据输入量。If there is an aggregation operation in the AST after metadata enhancement, the aggregation operation can be optimized according to the statistical information in the latest metadata information, and unnecessary data can be filtered out in advance before performing the aggregation operation, thereby reducing the amount of data input for the aggregation operation.
示例性地,可以根据统计信息中的列值的分布、列值的频率和列值的基数(不同列值的数量),确定数据的特性,根据数据的特性确定是否过滤掉该数据。比如,假设某列的某个值非常罕见,即该值的频率和基数比较低,则可能在聚合前过滤掉该值;基于统计信息识别高频出现的值或极端值,根据聚合的目的,确定是否包含或过滤掉这些值;基于统计信息中的时间戳或日期字段,确定数据的时效性,基于数据的时效性,过滤掉旧数据,即超过时效的数据;等等。For example, the characteristics of the data can be determined based on the distribution of column values, the frequency of column values, and the cardinality of column values (the number of different column values) in the statistical information, and whether to filter out the data can be determined based on the characteristics of the data. For example, assuming that a certain value in a certain column is very rare, that is, the frequency and cardinality of the value are relatively low, the value may be filtered out before aggregation; based on the statistical information, high-frequency values or extreme values are identified, and according to the purpose of aggregation, it is determined whether to include or filter out these values; based on the timestamp or date field in the statistical information, the timeliness of the data is determined, and based on the timeliness of the data, old data, that is, data that has exceeded the timeliness, is filtered out; and so on.
若元数据增强后的AST存在排序操作,则可以根据待排序列的索引、数据量和数据分布信息等,优化排序操作。If there is a sorting operation in the metadata-enhanced AST, the sorting operation can be optimized based on the index, data volume, and data distribution information of the column to be sorted.
示例性地,上述根据待排序列是否存在索引、待排序列的数据量和待排序列的数据分布信息,确定排序策略,可以包括:Exemplarily, the above-mentioned determining the sorting strategy according to whether there is an index in the sequence to be sorted, the amount of data in the sequence to be sorted, and the data distribution information of the sequence to be sorted may include:
若待排序列存在索引,且索引的类型为预设的适合排序操作的类型(比如B-tree索引),则可以直接基于待排序列的索引进行排序操作,而无需额外的排序步骤;否则,基于待排序列的数据量和/或数据分布信息,确定最适合的排序算法。其中,预设的适合排序操作的类型可以根据索引类型,将适合排序操作的类型作为预设的适合排序操作的类型。If the sequence to be sorted has an index, and the index type is a preset type suitable for sorting operations (such as a B-tree index), the sorting operation can be performed directly based on the index of the sequence to be sorted without an additional sorting step; otherwise, the most suitable sorting algorithm is determined based on the data volume and/or data distribution information of the sequence to be sorted. The preset type suitable for sorting operations can be used as the preset type suitable for sorting operations based on the index type.
因为索引通常以某种顺序(如升序或降序)存储数据,所以若索引包含所有需要的字段,则排序操作可以直接在索引上进行。Because indexes usually store data in some order (such as ascending or descending), sorting operations can be performed directly on the index if the index contains all the required fields.
上述待排序列的数据分布信息可以包括数据是否均匀分布、是否有频繁出现的值或是否存在长尾分布等等。The data distribution information of the sequence to be sorted may include whether the data is evenly distributed, whether there are frequently occurring values, or whether there is a long-tail distribution, etc.
示例性地,对于待排序列的数据量较大的情况,可以使用外部排序,对于待排序列的数据量较小的情况,可以使用快速排序,等等。Exemplarily, when the amount of data in the sequence to be sorted is large, external sorting may be used, and when the amount of data in the sequence to be sorted is small, quick sorting may be used, and so on.
通过上述方法可以有效的利用最新的元数据信息来优化聚合和排序的操作,从而提升查询性能和响应速度。The above method can effectively utilize the latest metadata information to optimize aggregation and sorting operations, thereby improving query performance and response speed.
如果数据库支持列式存储,可以根据查询涉及到的列和该列的访问模式调整数据的存储方式和访问方式。If the database supports columnar storage, you can adjust the data storage and access methods based on the columns involved in the query and the access mode of the columns.
基于最新的元数据信息中的统计信息(比如每列的访问次数/频率或访问模式)来确定哪些列被频繁访问,哪些列经常一起被访问,以及哪些列很少被查询或访问。对于列式存储的数据库,可以将经常一起被访问的列物理上靠近存储,或者在存储结构中做出优化,以加快这些列的联合访问速度。对于经常被查询或访问的列,可以优化其存储格式,比如优化压缩方式,以提高读取效率。比如,对于访问频率较低的列(冷数据),可以考虑使用更高的压缩比例来节省存储空间;对于访问频率较高的列(热数据),使用较低的压缩比例或更优化的存储格式,以加速读取操作。还可以基于访问模式,按时间或某个常用列来分区,从而优化相关查询式。Determine which columns are frequently accessed, which columns are frequently accessed together, and which columns are rarely queried or accessed based on the statistics in the latest metadata information (such as the number of accesses/frequency or access pattern of each column). For column-based storage databases, columns that are frequently accessed together can be stored physically close together, or optimizations can be made in the storage structure to speed up joint access to these columns. For columns that are frequently queried or accessed, their storage format can be optimized, such as compression methods, to improve reading efficiency. For example, for columns with low access frequency (cold data), you can consider using a higher compression ratio to save storage space; for columns with high access frequency (hot data), use a lower compression ratio or a more optimized storage format to speed up reading operations. You can also optimize related queries based on access patterns, partitioning by time or a common column.
通过以上步骤,可以有效地根据元数据信息来调整列式存储数据库中数据的存储方式和访问方式,特别是对于宽表(含有大量列的表),这样的优化可以显著提高分析查询的性能。Through the above steps, the storage and access methods of data in the column storage database can be effectively adjusted according to the metadata information. Especially for wide tables (tables with a large number of columns), such optimization can significantly improve the performance of analytical queries.
在一些实施例中,上述S104还可以包括:In some embodiments, the above S104 may further include:
若存在WHERE子句,则按照最新的元数据信息中的索引选择性由高到低的顺序,重新排序谓词;If there is a WHERE clause, reorder the predicates in descending order of index selectivity in the latest metadata information;
若存在多表连接,则根据最新的元数据信息中的表的大小信息、索引覆盖信息和/或表中数据的分布特征,调整多表连接的顺序和策略;If there are multiple table connections, adjust the order and strategy of the multiple table connections based on the table size information, index coverage information and/or distribution characteristics of the data in the table in the latest metadata information;
根据最新的元数据信息,确定查询过程中是否使用索引扫描,以及对于复合索引,分析查询条件中的列与复合索引中的索引列的匹配程度,确定复合索引中最有效的索引利用策略;Based on the latest metadata information, determine whether to use index scanning during the query process, and for composite indexes, analyze the matching degree between the columns in the query conditions and the index columns in the composite index, and determine the most effective index utilization strategy in the composite index;
根据最新的元数据信息中的分区信息,针对与查询条件相关的分区执行查询操作。According to the partition information in the latest metadata information, the query operation is performed on the partitions related to the query condition.
若元数据增强后的AST存在WHERE子句,查询优化器会重新对谓词进行排序,优先考虑基于高选择性索引的条件,以减少数据扫描量。If there is a WHERE clause in the metadata-enhanced AST, the query optimizer will reorder the predicates and give priority to conditions based on highly selective indexes to reduce the amount of data scanning.
若元数据增强后的AST存在多表连接,即涉及多表JOIN的情况,则可以根据多表连接涉及到的表的大小信息、索引覆盖信息和/或表中数据的分布特征,调整多表连接的顺序和策略,以提高查询效率。If there are multi-table connections in the metadata-enhanced AST, that is, multi-table JOINs are involved, the order and strategy of the multi-table connections can be adjusted according to the size information, index coverage information and/or distribution characteristics of the data in the tables involved in the multi-table connection to improve query efficiency.
表的大小信息包括表的行数或表占用的物理空间大小等信息。根据表的大小信息调整多表连接的顺序和策略,可以包括:基于涉及到的表的大小信息,选择最小的表作为驱动表进行多表连接操作。通过表的大小信息调整多表连接的顺序和策略可以有效减少在大表上的搜索和比较次数,尤其是在Nested Loops Join中,效果更明显。The size information of a table includes information such as the number of rows in the table or the size of the physical space occupied by the table. Adjusting the order and strategy of multi-table joins based on the size information of the table may include: selecting the smallest table as the driving table for multi-table join operations based on the size information of the tables involved. Adjusting the order and strategy of multi-table joins based on the size information of the tables can effectively reduce the number of searches and comparisons on large tables, especially in Nested Loops Join, where the effect is more obvious.
索引覆盖信息可以包括索引的存在性(是否存在索引)、索引的类型(比如,主键索引或非唯一索引等等)以及索引选择性(即索引键上的不同值的分布)等信息。根据索引覆盖信息调整多表连接策略可以包括:根据索引的存在性、索引的类型和索引选择性,利用有效的索引加速多表连接操作。示例性地,如果JOIN条件列上有高选择性的索引,可能会采用Indexed Nested Loops Join操作。Index coverage information may include information such as the existence of an index (whether an index exists), the type of index (e.g., primary key index or non-unique index, etc.), and index selectivity (i.e., the distribution of different values on the index key). Adjusting the multi-table join strategy based on index coverage information may include: using effective indexes to speed up multi-table join operations based on the existence of indexes, the type of indexes, and the index selectivity. For example, if there is a highly selective index on the JOIN condition column, an Indexed Nested Loops Join operation may be used.
表中数据的分布特征可以包括数据分布是否均匀或集中、以及数据的分区情况等信息。根据表中数据的分布特征调整多表连接策略可以包括:基于不同的数据分布特征,选择合适的JOIN类型,进行多表连接操作。示例性地,对于数据分布不均匀的表,选择HashJoin,完成多表连接操作;对于数据分布均匀且大小类似的两个及以上的表,选择MergeJoin,完成多表连接操作。大小类似是指大小的差值的绝对值小于预设大小差值,预设大小差值为一个较小值,可以根据实际使用需求进行设置。The distribution characteristics of the data in the table may include information such as whether the data distribution is uniform or concentrated, and the partitioning of the data. Adjusting the multi-table connection strategy according to the distribution characteristics of the data in the table may include: selecting the appropriate JOIN type based on different data distribution characteristics to perform multi-table connection operations. Exemplarily, for a table with uneven data distribution, select HashJoin to complete the multi-table connection operation; for two or more tables with uniform data distribution and similar sizes, select MergeJoin to complete the multi-table connection operation. Similar size means that the absolute value of the size difference is less than the preset size difference, and the preset size difference is a small value that can be set according to actual usage requirements.
本实施例还可以最新的元数据信息,进行索引选择优化。上述根据最新的元数据信息,确定查询过程中是否使用索引扫描,可以包括:根据最新的元数据信息中的索引信息(比如,索引选择性和索引的更新频率,等等),确定查询过程中是否使用索引扫描。示例性地,若某个列的索引高效,则查询将被重组为优先使用该索引。其中,索引选择性高且更新频率低的索引为高效索引,索引选择性高于预设选择性值,确定索引选择性高,索引的更新频率低于预设更新频率,确定索引的更新频率低。即,索引选择性高于预设选择性值,且索引的更新频率低于预设更新频率的索引为高效索引。This embodiment can also optimize the index selection based on the latest metadata information. The above-mentioned determination of whether to use index scanning in the query process based on the latest metadata information may include: determining whether to use index scanning in the query process based on the index information in the latest metadata information (for example, index selectivity and index update frequency, etc.). Exemplarily, if the index of a column is efficient, the query will be reorganized to give priority to using the index. Among them, an index with high index selectivity and low update frequency is an efficient index, an index with an index selectivity higher than a preset selectivity value is determined to have high index selectivity, and an index update frequency lower than a preset update frequency is determined to have low index update frequency. That is, an index with an index selectivity higher than a preset selectivity value and an index update frequency lower than a preset update frequency is an efficient index.
对于复合索引,将查询条件中的列与复合索引中的索引列的匹配程度最高的索引利用策略,确定为复合索引中最有效的索引利用策略。其中,在索引利用策略中,与查询条件中的各个列匹配的索引列的数量越多,与查询条件中的各个列匹配列的顺序越位于前面,与查询条件中的各个列匹配的索引列的选择性越高,则查询条件中的列与复合索引中的索引列的匹配程度越高。For a composite index, the index utilization strategy with the highest degree of matching between the columns in the query condition and the index columns in the composite index is determined as the most effective index utilization strategy for the composite index. Among them, in the index utilization strategy, the more the number of index columns matching the columns in the query condition is, the more the order of the columns matching the columns in the query condition is in the front, and the higher the selectivity of the index columns matching the columns in the query condition is, the higher the degree of matching between the columns in the query condition and the index columns in the composite index is.
具体地,可以首先分析复合索引的结构,确定复合索引包含的各个索引列和各个索引列的顺序;然后,检测查询条件中的各个列是否存在对应的索引列,并考虑对应的索引列在复合索引中的顺序,其中,在复合索引中,前面的列比后面的列更重,如果查询条件只使用了复合索引中的第一个列,索引的效率通常是最高的;接着,分析查询条件中对应的索引列的选择性,若查询条件中对应的索引列具有高选择性,那么使用该索引会更加高效。Specifically, you can first analyze the structure of the composite index to determine the index columns and the order of the index columns contained in the composite index; then, check whether there are corresponding index columns for each column in the query condition, and consider the order of the corresponding index columns in the composite index. In the composite index, the front columns are heavier than the back columns. If the query condition only uses the first column in the composite index, the efficiency of the index is usually the highest; then, analyze the selectivity of the corresponding index column in the query condition. If the corresponding index column in the query condition has high selectivity, it will be more efficient to use the index.
如果查询条件中的所有列都包含在复合索引中,并且该索引覆盖了查询所需的所有数据,这种情况下的索引称为“覆盖索引”,它可以极大地提高查询效率。If all columns in the query conditions are included in the composite index and the index covers all the data required for the query, the index in this case is called a "covering index", which can greatly improve query efficiency.
本实施例还可以根据最新的元数据信息中的分区信息,实现分区表的查询优化。如果数据存储在分区表中,查询重组可以考虑只扫描相关的分区,而不是全表扫描,从而可以提高查询效率,具体可以通过分析WHERE子句中与分区键相关的条件来实现。This embodiment can also optimize the query of the partition table according to the partition information in the latest metadata information. If the data is stored in the partition table, the query reorganization can consider scanning only the relevant partitions instead of the entire table, thereby improving the query efficiency, which can be achieved by analyzing the conditions related to the partition key in the WHERE clause.
上述分区信息可以包括分区范围、分区大小和分区键等信息。上述分区信息可以用于优化数据扫描范围。具体地,当查询条件与某个分区键相关时,可以仅扫描包含相关数据的分区,而不是全表扫描。示例性地,若一个表按照日期分区,且查询条件指定了特定日期范围,则可以仅扫描包含这些日期的分区。The partition information may include information such as partition range, partition size, and partition key. The partition information may be used to optimize the data scan range. Specifically, when the query condition is related to a partition key, only the partition containing the relevant data may be scanned instead of the entire table scan. For example, if a table is partitioned by date and the query condition specifies a specific date range, only the partition containing these dates may be scanned.
通过仅访问包含查询相关数据的分区,减少了需要处理的数据量,从而加快查询速度,提高总体查询效率。访问更少的分区意味着更少的磁盘I/O操作,这在大型数据库系统中尤其重要,因为I/O操作通常是性能瓶颈,通过分区定位可以减少磁盘I/O操作。对于一些特定类型的查询(如范围查询、聚合查询等等),通过定位到特定分区,可以显著减少需要处理的数据集的大小,优化特定类型的查询。By accessing only the partitions containing query-related data, the amount of data that needs to be processed is reduced, thereby speeding up queries and improving overall query efficiency. Accessing fewer partitions means fewer disk I/O operations, which is especially important in large database systems because I/O operations are often performance bottlenecks. Partition positioning can reduce disk I/O operations. For some specific types of queries (such as range queries, aggregation queries, etc.), by locating specific partitions, the size of the data set that needs to be processed can be significantly reduced, optimizing specific types of queries.
在一些实施例中,上述S105可以包括:In some embodiments, the above S105 may include:
基于重组后的抽象语法树,生成多个执行计划;Generate multiple execution plans based on the reorganized abstract syntax tree;
从多个执行计划中,选取执行成本最低的目标执行计划,并按照目标执行计划,执行SQL查询语句;Select the target execution plan with the lowest execution cost from multiple execution plans, and execute the SQL query statement according to the target execution plan;
在执行SQL查询语句的过程中,获取实时性能指标;Get real-time performance indicators during the execution of SQL query statements;
若实时性能指标不符合预期目标,则生成新的执行计划,并根据新的执行计划,执行SQL查询语句;If the real-time performance indicators do not meet the expected targets, a new execution plan is generated, and the SQL query statement is executed according to the new execution plan;
获取SQL查询语句的实际执行结果,并根据实际执行结果更新元数据信息。Get the actual execution result of the SQL query statement and update the metadata information according to the actual execution result.
基于重组后的抽象语法树,查询优化器可以生成多个潜在的执行计划,每个执行计划都有不同的数据访问路径和资源使用估计。利用最新的元数据信息中的统计信息,如行数、索引选择性等,查询优化器可以评估每个执行计划的预期成本,选择成本最低的执行计划作为目标执行计划,并将目标执行计划提交给数据库执行引擎进行处理。Based on the reorganized abstract syntax tree, the query optimizer can generate multiple potential execution plans, each with different data access paths and resource usage estimates. Using the latest metadata information, such as the number of rows, index selectivity, etc., the query optimizer can evaluate the expected cost of each execution plan, select the execution plan with the lowest cost as the target execution plan, and submit the target execution plan to the database execution engine for processing.
在SQL查询语句被执行的过程中,可以获取相关的实时性能指标,比如,执行时间、CPU使用率、内存消耗和I/O操作次数等指标中的至少一种。基于历史性能数据、预期的资源使用或经验值等等,可以设定各实时性能指标的阈值。将实时性能指标与对应的阈值进行比较,若实时性能指标不满足对应的阈值,则确定实时性能指标不符合预期目标,偏离预期。此时,查询优化器可以生成新的执行计划,比如,可以更换JOIN类型、选择不同的索引或改变操作的顺序,等等。根据新的执行计划,执行当前的SQL查询语句。新的执行计划也可以用于执行未来类似的查询。During the execution of the SQL query statement, relevant real-time performance indicators can be obtained, such as at least one of the indicators such as execution time, CPU usage, memory consumption and number of I/O operations. Thresholds for each real-time performance indicator can be set based on historical performance data, expected resource usage or experience values, etc. The real-time performance indicator is compared with the corresponding threshold. If the real-time performance indicator does not meet the corresponding threshold, it is determined that the real-time performance indicator does not meet the expected target and deviates from expectations. At this time, the query optimizer can generate a new execution plan, for example, it can change the JOIN type, select different indexes or change the order of operations, etc. According to the new execution plan, the current SQL query statement is executed. The new execution plan can also be used to execute similar queries in the future.
其中,当实时性能指标存在至少两个时,则所有的实时性能指标均不满足对应的阈值时,确定实时性能指标不符合预期目标。When there are at least two real-time performance indicators, and all the real-time performance indicators do not meet the corresponding thresholds, it is determined that the real-time performance indicators do not meet the expected target.
本实施例还可以根据SQL查询语句的实际执行结果,更新元数据信息中的统计信息,比如,可以包括更新行数、索引效率或数据分布等等,可以有助于查询优化器在未来做出更准确的执行计划预测。This embodiment can also update the statistical information in the metadata information according to the actual execution results of the SQL query statement, for example, it can include the number of updated rows, index efficiency or data distribution, etc., which can help the query optimizer to make more accurate execution plan predictions in the future.
在一些实施例中,在上述S105之后,上述数据库SQL查询优化方法还可以包括:In some embodiments, after S105, the database SQL query optimization method may further include:
基于查询日志和最新的元数据信息,确定SQL查询语句的执行频率和SQL查询语句的执行结果的数据变更频率;Based on the query log and the latest metadata information, determine the execution frequency of SQL query statements and the data change frequency of the execution results of SQL query statements;
若SQL查询语句的执行频率高于预设执行频率和/或SQL查询语句的执行结果的数据变更频率低于第二预设变更频率,则将SQL查询语句的执行结果存储在查询缓存中。If the execution frequency of the SQL query statement is higher than the preset execution frequency and/or the data change frequency of the execution result of the SQL query statement is lower than the second preset change frequency, the execution result of the SQL query statement is stored in the query cache.
本实施例通过分析查询日志和最新的元数据信息中的统计信息,可以识别出频繁执行和/或结果集相对稳定的查询,这些查询可以显著提高数据检索的速度,因此,可以将这些查询的查询结果(即执行结果)保存在查询缓存中。This embodiment can identify frequently executed queries and/or queries with relatively stable result sets by analyzing statistical information in query logs and the latest metadata information. These queries can significantly improve the speed of data retrieval. Therefore, the query results (i.e., execution results) of these queries can be saved in the query cache.
其中,可以通过定期或连续监控查询日志,实时捕捉各个查询的执行频率和/或执行模式,通过事件监听来持续监控执行结果的数据变更频率。基于实时的执行频率和执行结果的数据变更频率,来动态调整缓存策略。比如,当某个查询的执行频率显著增加,高于预设执行频率时,可以将其执行结果加入缓存;对于执行结果的数据变更频率低(低于第二预设变更频率)但执行频率高(高于预设执行频率)的查询,可以将执行结果加入缓存。Among them, the execution frequency and/or execution mode of each query can be captured in real time by regularly or continuously monitoring the query log, and the data change frequency of the execution result can be continuously monitored through event monitoring. The cache strategy is dynamically adjusted based on the real-time execution frequency and the data change frequency of the execution result. For example, when the execution frequency of a query increases significantly and is higher than the preset execution frequency, its execution result can be added to the cache; for queries with low data change frequency (lower than the second preset change frequency) but high execution frequency (higher than the preset execution frequency), the execution result can be added to the cache.
其中,预设执行频率和第二预设变更频率的取值可以根据实际使用需求确定,在此不做具体限制。Among them, the value of the preset execution frequency and the second preset change frequency can be determined according to actual usage requirements and is not specifically limited here.
本申请实施例的查询缓存策略可以基于数据库系统中的缓存管理组件执行。The query cache strategy of the embodiment of the present application can be executed based on the cache management component in the database system.
在一些实施例中,查询缓存的有效期与元数据的数据变更模式关联。In some embodiments, the validity period of the query cache is associated with a data change pattern of the metadata.
本实施例中的查询缓存的有效期与元数据的数据变更模式关联。可以通过元数据的更新时间戳或变更日志等,确定元数据的数据变更模式,进而根据元数据的数据变更模式,确定查询缓存的有效期。比如,假设元数据的数据变更模式为在每晚更新,则查询缓存的有效期可以为一天。The validity period of the query cache in this embodiment is associated with the data change mode of the metadata. The data change mode of the metadata can be determined by the update timestamp or change log of the metadata, and then the validity period of the query cache is determined according to the data change mode of the metadata. For example, assuming that the data change mode of the metadata is updated every night, the validity period of the query cache can be one day.
在本实施例中,当数据库的基础数据发生变更时,自动刷新相关的缓存内容,以保证数据的实时性和准确性。In this embodiment, when the basic data of the database changes, the relevant cache content is automatically refreshed to ensure the real-time and accuracy of the data.
本实施例可以通过分析和应用最新的元数据信息来动态管理查询缓存,智能决定哪些查询结果应该被缓存,以及确定缓存的有效期,而不是靠人工干预,可以进一步提高查询效率,并减少不必要的数据处理,显著减少重复查询的处理时间。自适应的缓存管理机制可以保证缓存数据的时效性和准确性,同时优化系统资源的使用,是提高查询性能的重要补充。This embodiment can dynamically manage the query cache by analyzing and applying the latest metadata information, intelligently determine which query results should be cached, and determine the validity period of the cache, rather than relying on manual intervention, which can further improve query efficiency, reduce unnecessary data processing, and significantly reduce the processing time of repeated queries. The adaptive cache management mechanism can ensure the timeliness and accuracy of cached data, while optimizing the use of system resources, and is an important supplement to improve query performance.
本申请实施例通过实时分析和应用元数据,不仅在查询执行前进行优化,还可以在查询执行过程中动态调整执行计划,甚至智能管理查询结果的缓存,以此全面提升数据库查询的效率和灵活性。可以充分利用和扩展元数据的潜力,通过创新性地整合增量元数据跟踪与同步、实时元数据驱动的SQL查询优化以及基于元数据的自适应查询缓存管理,为数据库查询优化提供了一个全面、动态和自适应的解决方案。数据库的查询策略从传统的静态查询优化转向更加动态和智能的数据驱动优化策略。The embodiments of the present application analyze and apply metadata in real time, not only to optimize before query execution, but also to dynamically adjust the execution plan during query execution, and even intelligently manage the cache of query results, so as to comprehensively improve the efficiency and flexibility of database queries. The potential of metadata can be fully utilized and expanded, and a comprehensive, dynamic and adaptive solution for database query optimization is provided through the innovative integration of incremental metadata tracking and synchronization, real-time metadata-driven SQL query optimization, and metadata-based adaptive query cache management. The query strategy of the database has shifted from traditional static query optimization to a more dynamic and intelligent data-driven optimization strategy.
本申请实施例提供的方法可以应用于gbase8s,也可以应用于其它使用的数据库,适用于大数据环境和高并发场景,如大型数据库系统、实时数据分析平台、商业智能和数据仓库等。可以解决在处理复杂和大规模数据集时的查询延迟问题,优化数据库资源的使用效率,降低系统的运行成本。通过动态调整查询策略和优化执行计划,可以实现对复杂查询的高效处理。该方法可以实现性能提升,资源优化,且适应性强。可以显著提高SQL查询的执行效率,尤其是在大型和复杂的数据库环境中,能有效降低响应时间,提高数据处理速率;通过智能索引管理和查询路径优化,减少不必要的数据读写和计算,从而优化系统资源利用,降低运行成本;根据实时数据库状态和元数据动态调整查询策略,使其能适应不同的数据模式和查询需求。The method provided in the embodiment of the present application can be applied to gbase8s, and can also be applied to other databases used, and is suitable for big data environments and high concurrency scenarios, such as large database systems, real-time data analysis platforms, business intelligence and data warehouses. It can solve the query delay problem when processing complex and large-scale data sets, optimize the use efficiency of database resources, and reduce the operating cost of the system. By dynamically adjusting the query strategy and optimizing the execution plan, efficient processing of complex queries can be achieved. The method can achieve performance improvement, resource optimization, and strong adaptability. The execution efficiency of SQL queries can be significantly improved, especially in large and complex database environments, which can effectively reduce response time and increase data processing rate; through intelligent index management and query path optimization, unnecessary data reading and writing and calculation are reduced, thereby optimizing system resource utilization and reducing operating costs; query strategies are dynamically adjusted according to real-time database status and metadata so that they can adapt to different data modes and query requirements.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。It should be understood that the order of execution of the steps in the above embodiment does not necessarily mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present invention.
以下为本发明的装置实施例,对于其中未详尽描述的细节,可以参考上述对应的方法实施例。The following is an embodiment of the device of the present invention. For details not described in detail, reference may be made to the corresponding method embodiment described above.
图2示出了本发明实施例提供的数据库SQL查询优化装置的结构示意图,为了便于说明,仅示出了与本发明实施例相关的部分,详述如下:FIG2 shows a schematic diagram of the structure of a database SQL query optimization device provided by an embodiment of the present invention. For ease of explanation, only the parts related to the embodiment of the present invention are shown, which are described in detail as follows:
如图2所示,数据库SQL查询优化装置20可以包括:第一获取模块21、第二获取模块22、生成模块23、查询优化模块24和执行模块25。As shown in FIG. 2 , the database SQL query optimization device 20 may include: a first acquisition module 21 , a second acquisition module 22 , a generation module 23 , a query optimization module 24 and an execution module 25 .
第一获取模块21,用于获取待处理的SQL查询语句,并将SQL查询语句转换为对应的抽象语法树;A first acquisition module 21, used to acquire a SQL query statement to be processed, and convert the SQL query statement into a corresponding abstract syntax tree;
第二获取模块22,用于基于元数据查询接口,获取最新的元数据信息;The second acquisition module 22 is used to acquire the latest metadata information based on the metadata query interface;
生成模块23,用于根据最新的元数据信息和抽象语法树,生成元数据增强后的抽象语法树;A generating module 23, used for generating an abstract syntax tree after metadata enhancement according to the latest metadata information and the abstract syntax tree;
查询优化模块24,用于基于元数据增强后的抽象语法树,进行查询动态重组和优化,得到重组后的抽象语法树;A query optimization module 24, configured to dynamically reorganize and optimize queries based on the abstract syntax tree enhanced by the metadata, and obtain a reorganized abstract syntax tree;
执行模块25,用于基于重组后的抽象语法树,执行SQL查询语句。The execution module 25 is used to execute the SQL query statement based on the reorganized abstract syntax tree.
在一种可能的实现方式中,元数据查询接口为元数据分析组件提供的用于查询最新的元数据信息的接口;In a possible implementation, the metadata query interface is an interface provided by the metadata analysis component for querying the latest metadata information;
元数据分析组件通过元数据监听器以事件订阅模式监控数据库的变化,并在监控到数据库的变化信息时,以差异的方式记录数据库的变化信息,以及采用延迟加载和惰性更新策略,在数据库的变化信息实际应用于SQL查询优化时,对数据库的变化信息进行加载和更新。The metadata analysis component monitors database changes in event subscription mode through metadata listeners, and records database change information in a differential manner when monitoring database change information. It also adopts delayed loading and lazy update strategies to load and update database change information when the database change information is actually applied to SQL query optimization.
在一种可能的实现方式中,元数据分析组件根据元数据的访问频率和修改频率,动态确定内存缓存中的元数据的保留或淘汰,以及通过事务日志分析或数据库的预写日志,确保元数据变更的原子性和一致性;In a possible implementation, the metadata analysis component dynamically determines whether to retain or eliminate metadata in the memory cache according to the access frequency and modification frequency of the metadata, and ensures the atomicity and consistency of metadata changes through transaction log analysis or a database write-ahead log;
其中,元数据分析组件在内存缓存中保留访问频率高于第一预设访问频率和/或修改频率低于第一预设修改频率的元数据,淘汰访问频率低于第二预设访问频率和/或修改频率高于第二预设修改频率的元数据;第一预设访问频率大于或等于第二预设访问频率,第一预设修改频率小于或等于第二预设修改频率。Among them, the metadata analysis component retains metadata with an access frequency higher than a first preset access frequency and/or a modification frequency lower than the first preset modification frequency in the memory cache, and eliminates metadata with an access frequency lower than a second preset access frequency and/or a modification frequency higher than the second preset modification frequency; the first preset access frequency is greater than or equal to the second preset access frequency, and the first preset modification frequency is less than or equal to the second preset modification frequency.
在一种可能的实现方式中,查询优化模块24具体用于:In a possible implementation, the query optimization module 24 is specifically used to:
若存在子查询或视图,则根据最新的元数据信息确定子查询或视图所依赖的目标表,在目标表的访问频率高于第三预设访问频率、目标表的索引符合预设高效索引条件、目标表的数据量小于预设数据量、目标表的数据变更频率低于第一预设变更频率、或者执行子查询或视图后的结果集大小小于预设大小时,将子查询或视图进行展开操作;If a subquery or view exists, the target table on which the subquery or view depends is determined according to the latest metadata information, and the subquery or view is expanded when the access frequency of the target table is higher than a third preset access frequency, the index of the target table meets the preset efficient index condition, the data volume of the target table is less than the preset data volume, the data change frequency of the target table is lower than the first preset change frequency, or the result set size after executing the subquery or view is less than the preset size;
若存在聚合操作,则根据最新的元数据信息中的统计信息,在聚合操作前提前过滤掉非必要数据,减少聚合操作的数据输入量;If there is an aggregation operation, then based on the statistical information in the latest metadata information, unnecessary data is filtered out in advance before the aggregation operation to reduce the amount of data input for the aggregation operation;
若存在排序操作,则根据最新的元数据信息确定待排序列是否存在索引、待排序列的数据量和待排序列的数据分布信息,并根据待排序列是否存在索引、待排序列的数据量和待排序列的数据分布信息,确定排序策略;If there is a sorting operation, determine whether there is an index for the sequence to be sorted, the amount of data in the sequence to be sorted, and the data distribution information of the sequence to be sorted according to the latest metadata information, and determine the sorting strategy according to whether there is an index for the sequence to be sorted, the amount of data in the sequence to be sorted, and the data distribution information of the sequence to be sorted;
若数据库支持列式存储,则根据最新的元数据信息调整数据库中的数据的存储方式。If the database supports column storage, adjust the storage method of the data in the database according to the latest metadata information.
在一种可能的实现方式中,查询优化模块24还可以用于:In a possible implementation, the query optimization module 24 may also be used to:
若存在WHERE子句,则按照最新的元数据信息中的索引选择性由高到低的顺序,重新排序谓词;If there is a WHERE clause, reorder the predicates in descending order of index selectivity in the latest metadata information;
若存在多表连接,则根据最新的元数据信息中的表的大小信息、索引覆盖信息和/或表中数据的分布特征,调整多表连接的顺序和策略;If there are multiple table connections, adjust the order and strategy of the multiple table connections based on the table size information, index coverage information and/or distribution characteristics of the data in the table in the latest metadata information;
根据最新的元数据信息,确定查询过程中是否使用索引扫描,以及对于复合索引,分析查询条件中的列与复合索引中的索引列的匹配程度,确定复合索引中最有效的索引利用策略;Based on the latest metadata information, determine whether to use index scanning during the query process, and for composite indexes, analyze the matching degree between the columns in the query conditions and the index columns in the composite index, and determine the most effective index utilization strategy in the composite index;
根据最新的元数据信息中的分区信息,针对与查询条件相关的分区执行查询操作。According to the partition information in the latest metadata information, the query operation is performed on the partitions related to the query condition.
在一种可能的实现方式中,执行模块25具体用于:In a possible implementation, the execution module 25 is specifically configured to:
基于重组后的抽象语法树,生成多个执行计划;Generate multiple execution plans based on the reorganized abstract syntax tree;
从多个执行计划中,选取执行成本最低的目标执行计划,并按照目标执行计划,执行SQL查询语句;Select the target execution plan with the lowest execution cost from multiple execution plans, and execute the SQL query statement according to the target execution plan;
在执行SQL查询语句的过程中,获取实时性能指标;Get real-time performance indicators during the execution of SQL query statements;
若实时性能指标不符合预期目标,则生成新的执行计划,并根据新的执行计划,执行SQL查询语句;If the real-time performance indicators do not meet the expected targets, a new execution plan is generated, and the SQL query statement is executed according to the new execution plan;
获取SQL查询语句的实际执行结果,并根据实际执行结果更新元数据信息。Get the actual execution result of the SQL query statement and update the metadata information according to the actual execution result.
在一种可能的实现方式中,数据库SQL查询优化装置20还可以包括:查询缓存优化模块。In a possible implementation, the database SQL query optimization device 20 may further include: a query cache optimization module.
查询缓存优化模块用于:在基于重组后的抽象语法树,执行SQL查询语句之后,基于查询日志和最新的元数据信息,确定SQL查询语句的执行频率和SQL查询语句的执行结果的数据变更频率;若SQL查询语句的执行频率高于预设执行频率和/或SQL查询语句的执行结果的数据变更频率低于第二预设变更频率,则将SQL查询语句的执行结果存储在查询缓存中。The query cache optimization module is used to: after executing the SQL query statement based on the reorganized abstract syntax tree, determine the execution frequency of the SQL query statement and the data change frequency of the execution result of the SQL query statement based on the query log and the latest metadata information; if the execution frequency of the SQL query statement is higher than the preset execution frequency and/or the data change frequency of the execution result of the SQL query statement is lower than the second preset change frequency, the execution result of the SQL query statement is stored in the query cache.
在一种可能的实现方式中,查询缓存的有效期与元数据的数据变更模式关联。In a possible implementation, the validity period of the query cache is associated with the data change pattern of the metadata.
图3是本发明实施例提供的终端的示意图。如图3所示,该实施例的终端3包括:处理器30和存储器31。所述存储器31用于存储计算机程序32,所述处理器30用于调用并运行所述存储器31中存储的计算机程序32,执行上述各个数据库SQL查询优化方法实施例中的步骤,例如图1所示的S101至S105。或者,所述处理器30用于调用并运行所述存储器31中存储的计算机程序32,实现上述各装置实施例中各模块/单元的功能,例如图2所示模块/单元21至25的功能。FIG3 is a schematic diagram of a terminal provided in an embodiment of the present invention. As shown in FIG3 , the terminal 3 of this embodiment includes: a processor 30 and a memory 31. The memory 31 is used to store a computer program 32, and the processor 30 is used to call and run the computer program 32 stored in the memory 31 to execute the steps in the above-mentioned various database SQL query optimization method embodiments, such as S101 to S105 shown in FIG1 . Alternatively, the processor 30 is used to call and run the computer program 32 stored in the memory 31 to implement the functions of each module/unit in the above-mentioned various device embodiments, such as the functions of the modules/units 21 to 25 shown in FIG2 .
示例性的,所述计算机程序32可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器31中,并由所述处理器30执行,以完成本发明。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述所述计算机程序32在所述终端3中的执行过程。例如,所述计算机程序32可以被分割成图2所示的模块/单元21至25。Exemplarily, the computer program 32 may be divided into one or more modules/units, which are stored in the memory 31 and executed by the processor 30 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of implementing specific functions, which are used to describe the execution process of the computer program 32 in the terminal 3. For example, the computer program 32 may be divided into modules/units 21 to 25 as shown in FIG. 2 .
所述终端3可包括,但不仅限于,处理器30、存储器31。本领域技术人员可以理解,图3仅仅是终端3的示例,并不构成对终端3的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述终端还可以包括输入输出设备、网络接入设备、总线等。The terminal 3 may include, but is not limited to, a processor 30 and a memory 31. Those skilled in the art will appreciate that FIG3 is merely an example of the terminal 3 and does not limit the terminal 3. The terminal 3 may include more or fewer components than shown in the figure, or may combine certain components, or different components. For example, the terminal may also include input and output devices, network access devices, buses, etc.
所称处理器30可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器 (Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列 (Field-Programmable Gate Array,FPGA) 或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor 30 may be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSP), application-specific integrated circuits (ASIC), field-programmable gate arrays (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor or any conventional processor, etc.
所述存储器31可以是所述终端3的内部存储单元,例如终端3的硬盘或内存。所述存储器31也可以是所述终端3的外部存储设备,例如所述终端3上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器31还可以既包括所述终端3的内部存储单元也包括外部存储设备。所述存储器31用于存储所述计算机程序以及所述终端所需的其他程序和数据。所述存储器31还可以用于暂时地存储已经输出或者将要输出的数据。The memory 31 may be an internal storage unit of the terminal 3, such as a hard disk or memory of the terminal 3. The memory 31 may also be an external storage device of the terminal 3, such as a plug-in hard disk, a smart media card (SMC), a secure digital (SD) card, a flash card, etc. equipped on the terminal 3. Further, the memory 31 may also include both an internal storage unit of the terminal 3 and an external storage device. The memory 31 is used to store the computer program and other programs and data required by the terminal. The memory 31 may also be used to temporarily store data that has been output or is to be output.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。The technicians in the relevant field can clearly understand that for the convenience and simplicity of description, only the division of the above-mentioned functional units and modules is used as an example for illustration. In practical applications, the above-mentioned function allocation can be completed by different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiment can be integrated in a processing unit, or each unit can exist physically separately, or two or more units can be integrated in one unit. The above-mentioned integrated unit can be implemented in the form of hardware or in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the scope of protection of this application. The specific working process of the units and modules in the above-mentioned system can refer to the corresponding process in the aforementioned method embodiment, which will not be repeated here.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。In the above embodiments, the description of each embodiment has its own emphasis. For parts that are not described or recorded in detail in a certain embodiment, reference can be made to the relevant descriptions of other embodiments.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professional and technical personnel can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of the present invention.
在本发明所提供的实施例中,应该理解到,所揭露的装置/终端和方法,可以通过其它的方式实现。例如,以上所描述的装置/终端实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性,机械或其它的形式。In the embodiments provided by the present invention, it should be understood that the disclosed devices/terminals and methods can be implemented in other ways. For example, the device/terminal embodiments described above are only schematic. For example, the division of the modules or units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.
所述集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个数据库SQL查询优化方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、电载波信号、电信信号以及软件分发介质等。If the integrated module/unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the present invention implements all or part of the processes in the above-mentioned embodiment method, and can also be completed by instructing the relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. When the computer program is executed by the processor, the steps of the above-mentioned database SQL query optimization method embodiments can be implemented. Among them, the computer program includes computer program code, and the computer program code can be in source code form, object code form, executable file or some intermediate form. The computer-readable medium may include: any entity or device that can carry the computer program code, recording medium, U disk, mobile hard disk, disk, optical disk, computer memory, read-only memory (ROM), random access memory (RAM), electric carrier signal, telecommunication signal and software distribution medium.
以上所述实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围,均应包含在本发明的保护范围之内。The embodiments described above are only used to illustrate the technical solutions of the present invention, rather than to limit the same. Although the present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that the technical solutions described in the aforementioned embodiments may still be modified, or some of the technical features may be replaced by equivalents. Such modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present invention, and should all be included in the protection scope of the present invention.
Claims (9)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410874325.6A CN118410068B (en) | 2024-07-02 | 2024-07-02 | Database SQL query optimization method, terminal and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410874325.6A CN118410068B (en) | 2024-07-02 | 2024-07-02 | Database SQL query optimization method, terminal and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118410068A CN118410068A (en) | 2024-07-30 |
CN118410068B true CN118410068B (en) | 2024-09-20 |
Family
ID=92032975
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410874325.6A Active CN118410068B (en) | 2024-07-02 | 2024-07-02 | Database SQL query optimization method, terminal and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118410068B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105740445A (en) * | 2016-02-02 | 2016-07-06 | 贵州大学 | A database query method and device |
CN111522816A (en) * | 2020-04-16 | 2020-08-11 | 云和恩墨(北京)信息技术有限公司 | Data processing method, device, terminal and medium based on database engine |
CN115470008A (en) * | 2022-11-14 | 2022-12-13 | 杭州拓数派科技发展有限公司 | Metadata access method and device and storage medium |
CN117130932A (en) * | 2023-09-01 | 2023-11-28 | 上海幻电信息科技有限公司 | SQL static detection method and device for big data platform |
CN117271481A (en) * | 2023-11-22 | 2023-12-22 | 深圳九有数据库有限公司 | Automatic database optimization method and equipment |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103942289B (en) * | 2014-04-12 | 2017-01-25 | 广西师范大学 | Memory caching method oriented to range querying on Hadoop |
US11874831B2 (en) * | 2021-08-19 | 2024-01-16 | Vesoft Inc. | Method and system for managing execution plan in graph database |
CN115168621A (en) * | 2022-06-14 | 2022-10-11 | 马上消费金融股份有限公司 | Data feedback method and device, electronic equipment and storage medium |
CN115576984A (en) * | 2022-09-13 | 2023-01-06 | 粤港澳国际供应链(广州)有限公司 | Method for generating SQL (structured query language) statement and cross-database query by Chinese natural language |
CN116955314A (en) * | 2023-07-05 | 2023-10-27 | 金篆信科有限责任公司 | Unified database maintenance and control methods, devices, equipment and storage media |
-
2024
- 2024-07-02 CN CN202410874325.6A patent/CN118410068B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105740445A (en) * | 2016-02-02 | 2016-07-06 | 贵州大学 | A database query method and device |
CN111522816A (en) * | 2020-04-16 | 2020-08-11 | 云和恩墨(北京)信息技术有限公司 | Data processing method, device, terminal and medium based on database engine |
CN115470008A (en) * | 2022-11-14 | 2022-12-13 | 杭州拓数派科技发展有限公司 | Metadata access method and device and storage medium |
CN117130932A (en) * | 2023-09-01 | 2023-11-28 | 上海幻电信息科技有限公司 | SQL static detection method and device for big data platform |
CN117271481A (en) * | 2023-11-22 | 2023-12-22 | 深圳九有数据库有限公司 | Automatic database optimization method and equipment |
Also Published As
Publication number | Publication date |
---|---|
CN118410068A (en) | 2024-07-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11030189B2 (en) | Maintaining up-to-date materialized views for time-series database analytics | |
US10073888B1 (en) | Adjusting partitioning policies of a database system in view of storage reconfiguration | |
US10073885B2 (en) | Optimizer statistics and cost model for in-memory tables | |
US6789071B1 (en) | Method for efficient query execution using dynamic queries in database environments | |
US8935232B2 (en) | Query execution systems and methods | |
US10585887B2 (en) | Multi-system query execution plan | |
US8612421B2 (en) | Efficient processing of relational joins of multidimensional data | |
US9390115B2 (en) | Tables with unlimited number of sparse columns and techniques for an efficient implementation | |
US8620888B2 (en) | Partitioning in virtual columns | |
US20170083573A1 (en) | Multi-query optimization | |
US20140214897A1 (en) | SYSTEMS AND METHODS FOR ACCESSING A NoSQL DATABASE USING BUSINESS INTELLIGENCE TOOLS | |
US20040243555A1 (en) | Methods and systems for optimizing queries through dynamic and autonomous database schema analysis | |
US20100235344A1 (en) | Mechanism for utilizing partitioning pruning techniques for xml indexes | |
US20100036799A1 (en) | Query processing using horizontal partial covering join index | |
CN112015742B (en) | Data processing method and device of HTAP database based on row and column coexistence | |
CN119537383B (en) | Storage method and device based on cold and hot data separation and multi-mode database engine | |
US10776368B1 (en) | Deriving cardinality values from approximate quantile summaries | |
CN118410068B (en) | Database SQL query optimization method, terminal and storage medium | |
EP2005332A1 (en) | Management of statistical views in a database system | |
CN115033597B (en) | Method and system for optimizing SQL execution in HTAP database using deep learning | |
US11995084B1 (en) | Database system for querying time-series data stored in a tiered storage using a cloud platform | |
US20240311350A1 (en) | Methods and system for recommending storage format for migrating a rdbms | |
CN119847992A (en) | Data query method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address | ||
CP03 | Change of name, title or address |
Address after: Room 201-33, Unit 2, Building 2, No. 39 Gaoxin 6th Road, Binhai Science and Technology Park, Binhai New Area, Tianjin, China 300458 Patentee after: TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co.,Ltd. Country or region after: China Address before: J-518, Haitai Green Industry Base, No. 6 Haitai Development Sixth Road, Huayuan Industrial Zone, Binhai New Area, Tianjin, 300384 Patentee before: TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co.,Ltd. Country or region before: China |