+

WO2024239782A1 - Query plan construction method and apparatus, electronic device and storage medium - Google Patents

Query plan construction method and apparatus, electronic device and storage medium Download PDF

Info

Publication number
WO2024239782A1
WO2024239782A1 PCT/CN2024/083246 CN2024083246W WO2024239782A1 WO 2024239782 A1 WO2024239782 A1 WO 2024239782A1 CN 2024083246 W CN2024083246 W CN 2024083246W WO 2024239782 A1 WO2024239782 A1 WO 2024239782A1
Authority
WO
WIPO (PCT)
Prior art keywords
expression
sub
query
data
constructed
Prior art date
Application number
PCT/CN2024/083246
Other languages
French (fr)
Chinese (zh)
Inventor
石志林
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2024239782A1 publication Critical patent/WO2024239782A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases

Definitions

  • the present application relates to the field of computers, and in particular to a query plan construction method, device, electronic device and storage medium.
  • a database management system is a software system used to manage and organize data. It can help users create, manage, store and retrieve data. When retrieving data, the database management system can expand data queries through query optimization technology, build query plans based on the expanded data queries, and then query data based on the query plans.
  • the embodiments of the present application provide a query plan construction method, device, electronic device and storage medium, which can facilitate the construction of data query plans and improve query efficiency.
  • the embodiment of the present application provides a query plan construction method, which is executed by an electronic device and includes:
  • the embodiment of the present application further provides a query plan construction device, which is deployed on an electronic device and includes:
  • An acquisition unit used for acquiring a data query expression, where the data query expression includes keywords
  • the key unit is used to obtain the basic sub-expressions in the data query expression and the association relationship between the basic sub-expressions according to the keywords;
  • the equivalent unit is used to expand the basic sub-expression to obtain an equivalent sub-expression of the basic sub-expression.
  • the basic sub-expression and the equivalent sub-expression of the basic sub-expression have equivalent expression meanings;
  • a determination unit used for determining a sub-expression to be constructed from a basic sub-expression and an equivalent sub-expression of the basic sub-expression
  • a construction unit is used to construct a data query plan through sub-expressions to be constructed and associated relationships.
  • An embodiment of the present application also provides an electronic device, including a memory storing multiple instructions; the processor loads instructions from the memory to execute the steps in any query plan construction method provided in the embodiment of the present application.
  • An embodiment of the present application also provides a computer-readable storage medium, which stores a plurality of instructions, and the instructions are suitable for a processor to load to execute the steps in any query plan construction method provided in the embodiment of the present application.
  • An embodiment of the present application also provides a computer program product, including a computer program/instruction, which, when executed by a processor, implements the steps in any query plan construction method provided in the embodiment of the present application.
  • the embodiment of the present application can obtain a data query expression, which includes keywords.
  • a data query expression which includes keywords.
  • there can be multiple basic sub-expressions in the data query expression and there is an association relationship between each basic sub-expression, so the basic sub-expressions in the data query expression and the association relationship between the basic sub-expressions can be obtained according to the keywords.
  • the basic sub-expression is expanded to obtain the equivalent sub-expression of the basic sub-expression. Since the equivalent sub-expression and the basic sub-expression have equivalent expression meanings, the equivalent sub-expression has the association relationship of the basic sub-expression, and the sub-expression to be constructed can be screened out from the basic sub-expression and the equivalent sub-expression.
  • the sub-expression to be constructed can be a basic sub-expression or an equivalent sub-expression.
  • a data query plan is constructed by the sub-expression to be constructed and the association relationship.
  • Multiple sub-expressions to be constructed can construct a data query plan based on the association relationship, so that multiple sub-expressions to be constructed do not conflict with each other when constructing the data query plan. In this way, it is convenient to construct a data query plan and improve query performance.
  • FIG. 1a is a schematic diagram of a scenario of a query plan construction method provided in an embodiment of the present application
  • FIG1b is a flow chart of a query plan construction method provided in an embodiment of the present application.
  • FIG2a is a schematic diagram of the structure of a query system provided in an embodiment of the present application.
  • FIG2b is a schematic structural diagram of a query optimizer provided in an embodiment of the present application.
  • FIG2c is a schematic diagram of creating a sub-expression provided in an embodiment of the present application.
  • FIG2d is a schematic diagram of obtaining metadata information provided by an embodiment of the present application.
  • FIG. 2e is a schematic diagram of a structure for constructing a data query plan provided in an embodiment of the present application.
  • FIG2f is a schematic diagram of a flow chart of constructing a data query plan provided in an embodiment of the present application.
  • FIG3 is a schematic diagram of the structure of a query plan construction device provided in an embodiment of the present application.
  • FIG. 4 is a schematic diagram of the structure of a server provided in an embodiment of the present application.
  • Embodiments of the present application provide a query plan construction method, device, electronic device and storage medium.
  • the query plan construction device can be integrated into an electronic device, which can be a terminal, a server, or other devices.
  • the terminal can be a mobile phone, a tablet computer, a smart Bluetooth device, a laptop, or a personal computer (PC), etc.
  • the server can be a single server or a server cluster composed of multiple servers.
  • the query plan construction device can also be integrated into multiple electronic devices.
  • the query plan construction device can be integrated into multiple servers, and the query plan construction method of the present application can be implemented by multiple servers.
  • the server may also be implemented in the form of a terminal.
  • the electronic device can obtain a data query expression, which includes keywords; based on the keywords, obtain basic sub-expressions in the data query expression, and the association relationship between the basic sub-expressions; expand the basic sub-expressions to obtain equivalent sub-expressions of the basic sub-expressions, and the equivalent sub-expressions and the basic sub-expressions have equivalent expression meanings; determine the sub-expressions to be constructed from the basic sub-expressions and the equivalent sub-expressions of the basic sub-expressions; and construct a data query plan through the sub-expressions to be constructed and the association relationships.
  • the equivalent sub-expression has an association relationship with the basic sub-expression.
  • the sub-expression to be constructed can be screened out from the basic sub-expression and the equivalent sub-expression.
  • the sub-expression to be constructed can be a basic sub-expression or an equivalent sub-expression. Multiple sub-expressions to be constructed can construct a data query plan based on the association relationship, so that multiple sub-expressions to be constructed do not conflict with each other when constructing the data query plan. In this way, it is easy to construct a data query plan and improve the query performance.
  • a query plan construction method is provided, as shown in FIG1b , and the specific process of the query plan construction method can be as follows:
  • data query expression is a grammatical structure used to query and filter data from multiple database management systems.
  • the language of data query expression can be simple expression language (Expression Language, EL), query expression language (Query Domain Specific Language, Query DSL), a Java persistence standard query language (Java Persistence API Query Language, JPQL), an expression language in the Spring framework (Spring Expression Language, SpEL), etc.
  • Keywords are special words or characters used to specify query conditions in data query expressions, where a data query expression can contain multiple query conditions.
  • keywords may include query (SELECT): used to select the data columns to be queried from the database, from (FROM) used to specify the table of data to be queried, condition (WHERE) used to specify the filtering conditions for query results, sort (ORDER BY) used to sort the query results according to the specified column, and so on.
  • JDBC Java Database Connectivity
  • the method for obtaining data query expressions can be:
  • any database management system can convert the data query command into a data query expression and then send the data query expression to the query optimizer.
  • the database management system can convert the data query command into a data query expression, and then transmit the data query expression to the query optimizer;
  • the query optimizer is a component that provides query optimization technology.
  • obtaining a data query expression includes:
  • Get data query command the data query command includes keywords
  • the data query command is parsed and processed to obtain the syntax tree of the data query command
  • the preset data exchange language is used to perform format conversion on the syntax tree to obtain a data query expression.
  • the data query command is a command input by a user when querying data from a database management system.
  • the data query command can be a query field input by a user, a voice command of a user, or a command output when a data query script written by a user is executed, etc.
  • Keywords can correspond to query conditions in data query commands.
  • the data query command is to filter out the same data from the ath column of table T1 and the bth column of table T2.
  • the query conditions in the data query command may include querying the ath column of table T1, querying the bth column of table T2, and filtering out the same data from the ath column of table T1 and the bth column of table T2.
  • the keyword can be used to indicate querying the ath column of table T1, to indicate querying the bth column of table T2, and to indicate filtering out the same data from the ath column of table T1 and the bth column of table T2.
  • the syntax tree represents the query logic of the data query command, wherein the syntax tree includes multiple nodes representing different operations.
  • the SELECT node represents the operation of querying data
  • column 1 and column 2 nodes represent the columns to be queried
  • the FROM node represents which tables to query from
  • table1 and table2 are the names of these tables
  • the WHERE node represents the conditions to limit the query results, and so on.
  • the preset data interaction language can enable the query optimizer to interact with different types of database management systems.
  • the preset data interaction language can be EL language, Query DSL language, JPQL language, etc.
  • the basic sub-expression is used to implement the query conditions indicated by the keywords in the data query expression.
  • the basic sub-expression can be used to select the data columns of the table to be queried from the database management system, to specify the filtering conditions of the query results, to sort the query results according to the specified columns, and so on.
  • the association relationship is used to indicate the logical order of execution of the basic sub-expressions.
  • the association relationship can include parallel relationships, progressive relationships, and AND relationships between basic sub-expressions, etc.
  • the AND relationship is used to limit multiple query conditions to the same query. Only when all conditions are met at the same time can the data that meets the requirements be returned.
  • the data query expression indicates that the table join method is used to connect table T1 and table T2.
  • the equivalent sub-expression can be used to replace the basic sub-expression, and has the same query result as the basic sub-expression, but the query logic is different, wherein the basic sub-expression corresponds to one or more equivalent sub-expressions.
  • the equivalent expression meaning is used to limit the equivalent sub-expression and the basic sub-expression to have the same semantics, so that the query result of the equivalent sub-expression is the same as the query result of the basic sub-expression.
  • the equivalent sub-expression can be a sub-expression that scans (Scan) the data in the a-th column of table T1, or it can be a sub-expression that first scans the data in table T1 and then redistributes (Redistribute) the data in the a-th column of table T1, and so on. In this way, both the basic sub-expression and the equivalent sub-expression can query the data in the a-th column of table T1.
  • the basic sub-expression in order to facilitate obtaining an equivalent sub-expression, is expanded to obtain an equivalent sub-expression of the basic sub-expression, including:
  • the preset expression set including multiple preset expressions, and the multiple preset expressions have equivalent expression meanings;
  • an equivalent sub-expression is determined from a preset set of expressions.
  • the preset expression set is a set of multiple preset expressions with equivalent expression meanings.
  • a preset expression set records multiple preset expressions with different query logics, but each preset expression in the preset expression set has the same expression meaning and can obtain the same query result.
  • the preset expression set makes it easy to determine equivalent sub-expressions of basic sub-expressions.
  • the sub-expression to be constructed is a sub-expression whose query cost satisfies a preset condition among the basic sub-expression and the equivalent sub-expression of the basic sub-expression.
  • the sub-expression to be constructed can be a sub-expression with the smallest query cost among the basic sub-expression and the equivalent sub-expression, or any sub-expression whose query cost satisfies the preset condition, and so on.
  • a sub-expression to be constructed is determined from the basic sub-expression and the equivalent sub-expression of the basic sub-expression, including:
  • the first condition number is the condition number of the data connection condition associated with the base sub-expression
  • the second condition number is the condition number of the data connection condition associated with the equivalent sub-expression of the base sub-expression
  • a sub-expression to be constructed is determined from a basic sub-expression and an equivalent sub-expression of the basic sub-expression.
  • employee data table and department data table which contain employee and department information respectively.
  • employee department ID employee ID (department.id)
  • employee city employee city (department.city).
  • the first conditional number represents conditional data of a connection relationship between at least two tables as a base sub-expression.
  • the second condition number is condition data of a connection relationship between at least two tables represented by an equivalent sub-expression.
  • Inner Hash Join is based on hash tables to connect tables, and has excellent performance when processing large-scale data.
  • the main steps are to store the data in the two tables to be connected in the hash table in memory respectively, and use the hash function to map them to the corresponding hash buckets. Then, for the data in each hash bucket, the connection is performed by comparing the connection conditions between them.
  • Inner Hash Join needs to build additional data structures such as hash tables, it usually requires more memory and computing resources in actual situations.
  • a smaller number of conditions can be screened out from the first number of conditions and the second number of conditions. If the smaller number of conditions is the first number of conditions, the basic sub-expression is used as the sub-expression to be constructed. If the smaller number of conditions is the second number of conditions, the equivalent sub-expression corresponding to the second number of conditions is used as the sub-expression to be constructed.
  • the condition quantity with a large number can be screened out from the first condition quantity and the second condition quantity. If the condition quantity with a large number is the first condition quantity, the basic sub-expression is used as the sub-expression to be constructed. If the larger number of conditions is the second number of conditions, the equivalent sub-expression corresponding to the second number of conditions is used as the sub-expression to be constructed. In this way, data query can be executed efficiently, so as to provide better performance when processing large-scale data.
  • each expression has a different query logic, and thus each expression has a different query cost
  • determining a sub-expression to be constructed from the basic sub-expression and the equivalent sub-expression of the basic sub-expression includes:
  • the sub-expression to be constructed is determined from the basic sub-expression and the equivalent sub-expression of the basic sub-expression.
  • the data to be queried is the data in the table that the basic sub-expression needs to query.
  • the basic sub-expression is used to query table A
  • the data to be queried is the data in table A.
  • Metadata information is data information that describes the data to be queried.
  • metadata information is used to describe the data to be queried, including information such as the format, size, encoding method, and access protocol of the data.
  • the data volume is the size of the data to be queried that the user needs to query, usually expressed in the form of number of records or file size, etc.
  • the size of the data to be queried can be obtained through metadata information.
  • the basic sub-expression is Get(T1)
  • the equivalent sub-expression of the basic sub-expression is Scan(T1).
  • Get(T1) and Scan(T1) are used to query the data to be queried in table T1.
  • the basic sub-expression Get(T1) can be used as the sub-expression to be constructed. If the amount of data is large, the equivalent sub-expression Scan(T1) can be used as the sub-expression to be constructed.
  • the data query plan is an executable data query solution constructed based on the association relationship of the sub-expressions to be constructed.
  • the association relationship between basic sub-expression 1 and basic sub-expression 2 can be a parallel relationship
  • the association relationship between basic sub-expression 1 and basic sub-expression 2 and basic sub-expression 3 respectively can be an AND relationship.
  • the subexpression to be constructed can be a basic subexpression or an equivalent subexpression of a basic subexpression, that is, there is a subexpression to be constructed 1 determined from basic subexpression 1 and the equivalent subexpression 1 of basic subexpression 1, a subexpression to be constructed 2 determined from basic subexpression 2 and the equivalent subexpression 2 of basic subexpression 2, and a subexpression to be constructed 3 determined from basic subexpression 2 and the equivalent subexpression 2 of basic subexpression 2.
  • the sub-expressions 3 to be constructed determined in the sub-expressions 3 and the equivalent sub-expressions 3 of the basic sub-expressions 3, each sub-expression to be constructed has an association relationship with other sub-expressions to be constructed.
  • the association relationship between equivalent sub-expression 1 and basic sub-expression 2 or equivalent sub-expression 2 is a parallel relationship; the association relationship between equivalent sub-expression 1 and basic sub-expression 3 or equivalent sub-expression 3 can be an AND relationship; the association relationship between equivalent sub-expression 2 and basic sub-expression 3 or equivalent sub-expression 3 can be an AND relationship.
  • association relationship between sub-expression 1 to be constructed and sub-expression 2 to be constructed is a parallel relationship
  • association relationship between sub-expression 1 to be constructed and sub-expression 2 to be constructed and sub-expression 3 to be constructed is an AND relationship respectively
  • control sub-expression 1 to be constructed and sub-expression 2 to be constructed to construct query methods independently and then based on the association relationship between sub-expression 1 to be constructed and sub-expression 2 to be constructed and sub-expression 3 to be constructed being an AND relationship respectively, construct a query plan for sub-expression 3 to be constructed to filter the query results of sub-expression 1 to be constructed and sub-expression 2 to be constructed.
  • a data query plan is constructed by using the sub-expression to be constructed and the association relationship, including:
  • the sub-query plans to be processed are merged to obtain a data query plan.
  • the current query optimization request is a query optimization request constructed by the query conditions associated with the sub-expression to be constructed in the data query expression.
  • the query conditions can be the data distribution, index type, etc. associated with the sub-expression to be constructed in the data query expression.
  • the current query optimization request can be used to optimize the data query plan constructed by the sub-expression to be constructed.
  • the current query optimization request may be a request for quickly acquiring data to be queried, a request for reducing query costs, a request for satisfying query speed and query costs within a certain range, and so on.
  • the initial subquery plan may be multiple unoptimized subquery plans constructed for the sub-expression to be constructed.
  • the multiple unoptimized subquery plans include subquery plans with high query costs, subquery plans with low query costs, subquery plans with slow query speeds, subquery plans with fast query speeds, and so on.
  • the subquery plan to be processed may be an initial subquery plan obtained from the initial subquery plan based on the association relationship and satisfying the current query optimization request.
  • sub-expression 1 to be constructed For example, based on the data query expression, sub-expression 1 to be constructed, sub-expression 2 to be constructed and sub-expression 3 to be constructed are determined, sub-expression 1 to be constructed is used to query table T1, sub-expression 2 to be constructed is used to query the data in column b of table T2, sub-expression 3 to be constructed is used to filter data from the data obtained from sub-expression 1 to be constructed and sub-expression 2 to be constructed, and the data query expression also includes sorting according to column a of table T1.
  • the association relationship between sub-expression 1 to be constructed and sub-expression 2 to be constructed is a parallel relationship between the basic sub-expressions, and the association relationship between sub-expression 1 to be constructed and sub-expression 2 to be constructed and sub-expression 3 to be constructed is an AND relationship.
  • the current query optimization request associated with sub-expression 1 to be constructed can be used to enable sub-expression 1 to be constructed to quickly obtain query table T1, and to sort the data queried from table T1 according to the ath column of the table.
  • the current query optimization request associated with sub-expression 2 to be constructed can be used to enable sub-expression 2 to be constructed to quickly obtain the bth column data in query table T2.
  • the current query optimization request associated with sub-expression 3 to be constructed can be used to merge the sorted (sorted according to the ath column of table T1) filtered data collected by sub-expression 3 to be constructed from all database nodes.
  • the initial subquery plan constructed by the sub-expression 1 to be constructed may include Get(T1), Scan(T1), Scan(T1)+Sort(T1.a), and the initial subquery plan constructed by the sub-expression 2 to be constructed may include Scan(T2)+Redistribute(T2.b), Get(T2.b).
  • the sub-query plan to be processed can be determined from the initial sub-query plan:
  • the initial subquery plans constructed by subexpression 1 to be constructed and subexpression 2 to be constructed can be independently filtered.
  • Scan(T1) can be filtered out from the initial subquery plan Get(T1), Scan(T1), Scan(T1)+Sort(T1.a), and through the current query optimization request associated with subexpression 2 to be constructed, Scan(T2)+Redistribute(T2.b) can be filtered out from the initial subquery plan Scan(T2)+Redistribute(T2.b), Get(T2.b).
  • an initial sub-query plan is constructed by using the sub-expression to be constructed, including:
  • connection relationship between the query operators is constructed
  • the query operators are controlled to be arranged according to the connection relationship to obtain the initial subquery plan.
  • the query operator may support obtaining required data from the database, for example, the query operator may be Get, Scan, Sort, GatherMerge, Redistribute, Replicate, and the like.
  • connection relationship is used to restrict the order in which query operators are executed.
  • sub-expression 2 to be constructed is used to obtain the data in column b of query table T2, that is, multiple query operators (Get, Scan, and Redistribute) associated with sub-expression 2 to be constructed, and the connection relationship between the query operators constructed by sub-expression 2 to be constructed can be Scan first and then Redistribute, or only Get.
  • the query operators are arranged according to the connection relationship, and the initial sub-query plan Scan(T2)+Redistribute(T2.b), Get(T2.b), or Get(T2.b) can be obtained.
  • the sub-query plan to be processed is determined from the initial sub-query plan, including:
  • metadata information of the to-be-queried data associated with the to-be-constructed sub-expression is obtained
  • the subquery plan to be processed is determined from the initial subquery plan.
  • the metadata information is data information describing the data to be queried.
  • the metadata information records various attributes and characteristics of the data, such as data type, data source, data storage method, data access rights, etc.
  • the various attributes of the data include the attributes and characteristics describing the table, such as table name, column name, column data type, column constraints, index information, data size, etc.
  • the computational amount is the computational operations and resource consumption required to execute the initial subquery plan, which can be understood as the query cost.
  • the computational amount may include the size or time of scanning data files, the size or time of reading indexes, and so on.
  • the association relationship between sub-expression 1 to be constructed and sub-expression 2 to be constructed is a parallel relationship between sub-expressions
  • the association relationship between sub-expression 1 to be constructed and sub-expression 2 to be constructed and sub-expression 3 to be constructed is an AND relationship.
  • the sub-query plan determined from the initial sub-query plan can be:
  • the sub-query plan to be processed can be directly screened from the initial sub-query plan constructed by sub-expression 1 to be constructed and sub-expression 2 to be constructed based on the calculation amount.
  • sub-query plan 1 to be processed constructed by sub-expression 1 to be constructed and sub-query plan 2 to be processed constructed by sub-expression 2 to be constructed so as to avoid overlapping plans between sub-query plan 3 to be processed constructed by sub-expression 3 to be constructed and sub-query plan 1 to be processed and sub-query plan 2 to be processed, or to avoid sub-query plan 1 to be processed, sub-query plan 2 to be processed and sub-query plan 3 to be processed from failing to fully execute all query conditions.
  • the sub-expression to be constructed includes a first expression and a second expression, the first expression is used to query the data to be queried, and the second expression is used to filter the data to be queried, and according to the current query optimization request and the association relationship associated with the sub-expression to be constructed, the sub-query plan to be processed is determined from the initial sub-query plan, including:
  • the unsatisfied requests and the association relationship determine a final optimization request from the current query optimization requests associated with the second expression, where the final optimization request includes the unsatisfied request;
  • a second subquery plan is determined from the second initial subquery plan, where the second initial subquery plan is an initial subquery plan constructed by using the second expression, and the subquery plans to be processed include the first subquery plan and the second subquery plan.
  • the first expression is used to query the data to be queried.
  • the first expression can be the above-mentioned sub-expression 1 to be constructed and the sub-expression 2 to be constructed, the sub-expression 1 to be constructed is used to query table T1, and the sub-expression 2 to be constructed is used to query the data in column b of table T2.
  • the current query optimization request associated with the first expression is used to filter the initial subquery plan required for retrieving the data to be queried.
  • the first initial sub-query plan is an initial sub-query plan constructed by the first expression, and is used to query the data to be queried, wherein the first initial sub-query plan constructed by the first expression may be one or more.
  • the first sub-query plan is a first initial sub-query that satisfies a condition of a query cost selected according to a current query optimization request associated with a first expression.
  • the current query optimization request satisfied by the first sub-query plan is a query condition that can be satisfied when the first sub-query plan is executed.
  • the unsatisfied request is a query optimization request that is not fulfilled by the first subquery plan in the current query optimization request associated with the first expression.
  • the current query optimization request associated with sub-expression 1 to be constructed can be used to enable sub-expression 1 to be constructed to quickly obtain query table T1, and sort the data queried from table T1 according to the a-th column of the table.
  • the first sub-query plan constructed by the first expression according to the current query optimization request associated with the first expression is Scan(T1)
  • the current query optimization request satisfied by the first sub-query plan only includes obtaining query table T1
  • the unsatisfied request is to sort the data queried from table T1 according to the a-th column of the table.
  • the final optimization request is a current query optimization request associated with the second expression obtained based on the unsatisfied request, so that all query conditions in the data query expression can be realized through the query optimization request satisfied by the final optimization request and the first sub-query plan.
  • the current query optimization request associated with the sub-expression 3 to be constructed can be used to merge the sorted (sorted according to the ath column of table T1) filtered data collected by the sub-expression 3 to be constructed from all database nodes. Since the first sub-query plan constructed by the sub-expression 1 to be constructed can only obtain the query table T1, the data queried from table T1 is not sorted according to the ath column of the table (the request is not met), thus, the current query optimization request associated with the second expression needs to include the unsatisfied request to realize the query condition in the data query expression.
  • the second initial subquery plan is an initial subquery plan constructed by using the second expression.
  • the second sub-query plan is a second initial sub-query that meets the condition based on the query cost selected by the final optimization request.
  • the method provided in the embodiment of the present application further includes:
  • the historical query plan corresponding to the pending historical request is used as the pending sub-query plan
  • the step of constructing an initial subquery plan through the sub-expression to be constructed is triggered.
  • the historical query optimization request is a query optimization request that has been used by the sub-expression to be constructed in a historical time period.
  • the historical query plan is a query plan constructed by the sub-expression to be constructed according to the historical query optimization request.
  • the pending historical request is a historical query optimization request that is the same as the current query optimization request.
  • the same sub-expression to be constructed when the same sub-expression to be constructed is constructing a sub-query plan to be processed, it may face multiple different query optimization requests in different query time periods. If the current query optimization request is the same as the historical query optimization request, that is, there is a historical request to be processed, then the historical query plan corresponding to the historical request to be processed is used as the sub-query plan to be processed. If there is no historical request to be processed, the initial sub-query plan is directly constructed through the sub-expression to be constructed, and then the initial sub-query plan is filtered through the current query optimization request to obtain the sub-query plan to be processed.
  • the historical query plan corresponding to the pending historical request is used as the pending sub-query plan, including:
  • plan retrieval link associated with the pending historical request is used to retrieve the historical query plan corresponding to the historical query optimization request as the pending sub-query plan.
  • the hash table can map the hash value with the historical query optimization request, and the historical query optimization request carries the historical query plan, so as to quickly filter the pending historical requests that are the same as the current query optimization request from the historical query optimization request, so that the historical query plan corresponding to the pending historical request can be obtained.
  • the first tag is the hash value obtained after the historical query optimization request is calculated by the hash function.
  • the plan retrieval link is a link for retrieving the query optimization plan used by the pending historical request.
  • the second tag is the hash value obtained after the current query optimization request is calculated using the same hash function.
  • the first label is the hash value obtained by calculating the historical query optimization request through the hash function
  • the second label is the hash value obtained by calculating the current query optimization request through the hash function
  • the replica instruction is sent to multiple data storage nodes of the database, so that the data storage nodes return the query data according to the replica instruction.
  • the query instruction is a representation of a data query plan so that the database can return query data through the query instruction.
  • the copy instruction is an instruction obtained by copying the query instruction.
  • the data storage node is a node for storing data in the database, and each data storage node returns data according to the received replica instruction.
  • sub-expression 3 to be constructed is used to filter data from the data obtained from sub-expression 1 to be constructed and sub-expression 2 to be constructed.
  • the returned data is filtered according to sub-expression 3 to be constructed.
  • the data queried through the data query plan can be applied to fields such as data analysis, such as advertising content query, financial risk control analysis, information recommendation, etc.
  • advertisements when querying for advertising content, advertisements can be quickly queried through a data query plan, and the advertisements can be displayed in the content played by the application.
  • each optimization stage is optimized independently, which may lead to conflicts in the optimization strategies of each stage, resulting in high uncertainty and error rate in the query process. Therefore, when querying the target type of advertisements through the query optimizer provided by the relevant technology, its query efficiency is low.
  • the current sub-expression in the data query expression can be expanded through an equivalent sub-expression, and the sub-expressions to be constructed with high query efficiency can be filtered out from the equivalent sub-expressions and the current sub-expression, and multiple sub-expressions to be constructed can be used to construct a data query plan based on the association relationship between the current sub-expressions, thereby avoiding conflicting or overlapping steps in the data query plan, further speeding up the query efficiency of specific types of advertisements. In this way, specific types of advertisements can be quickly displayed in the content played by the application.
  • data query plan construction can also be applied to the field of financial risk control analysis.
  • financial risk control analysis can quickly query data related to the financial product through the data query plan constructed by this application, and recommend the data related to the financial product to the user.
  • data query plan construction can also be applied to the field of information recommendation.
  • the data query plan constructed by this application can quickly search for information and recommend the searched information to the user.
  • the embodiment of the present application can obtain a data query expression, which includes keywords.
  • the data query expression can have multiple basic sub-expressions, and there is an association relationship between each basic sub-expression. Therefore, the basic sub-expressions in the data query expression and the association relationship between the basic sub-expressions can be obtained according to the keywords. Then the basic sub-expression is expanded to obtain the equivalent sub-expression of the basic sub-expression. Since the equivalent sub-expression and the basic sub-expression have equivalent expression meanings, the equivalent sub-expression has an association relationship with the basic sub-expression.
  • the sub-expression to be constructed can be screened out from the basic sub-expression and the equivalent sub-expression.
  • the sub-expression to be constructed can be It is a basic sub-expression or an equivalent sub-expression. Then, a data query plan is constructed through the sub-expression to be constructed and the associated relationship. Multiple sub-expressions to be constructed can construct a data query plan based on the associated relationship, so that multiple sub-expressions to be constructed do not conflict with each other when constructing a data query plan. In this way, it is easy to construct a data query plan and improve query performance.
  • the query system is taken as an example to describe the method of the embodiment of the present application in detail.
  • a query system includes a database management system and a query optimizer.
  • the query optimizer is located outside the database management system.
  • the query optimizer is used to construct a data query plan.
  • the database management system specifically includes:
  • a parser used for performing syntax analysis on the received data query command to obtain a syntax tree of the data query command
  • a query converter is used to receive the syntax tree sent by the parser, and use a preset data exchange language to perform format conversion processing on the syntax tree to obtain a data query expression, so that the query optimizer can construct a data query plan based on the data query expression;
  • the metadata provider is used to receive a request for obtaining metadata information sent by the query optimizer when the query optimizer constructs a data query plan;
  • the metadata provider is an interface for exchanging metadata information between the query optimizer and the database management system.
  • the metadata provider component is a plug-in specific to the database management system and is used to retrieve metadata information from the database management system.
  • the directory is used to receive the acquisition request sent by the metadata provider, and send the acquisition request to the metadata exchanger through the metadata provider, so that the acquisition request can obtain metadata information from the directory through the metadata exchanger, and send the metadata information to the parser;
  • the directory is a database in a database management system that stores metadata information
  • the metadata information is data information that describes the data in the database.
  • Metadata exchanger Use the preset data exchange language (EL language) to exchange metadata information.
  • EL language is a data exchange language in XML format, which contains metadata information in the database management system, such as table definition, operator definition, etc. Through the acquisition request of EL language, it can interact with different back-end database management systems and obtain the required metadata information. At the same time, the metadata exchanger can also extract metadata information from the database system and convert it into EL language to achieve decoupling of the query optimizer and the database management system;
  • EL language refers to a framework for exchanging information between the query optimizer and the database management system. Separating the query optimizer from the database management system requires building a communication mechanism for processing queries.
  • the framework uses an XML-based language to encode the necessary information required for communication, such as input queries, output plans, and metadata.
  • a simple communication protocol is used on the data exchange language framework to obtain metadata information and send data query plans;
  • a plan converter used to convert a data query plan into a query instruction
  • the executor is used to receive the query instructions sent by the plan converter and return the query data according to the query instructions.
  • the executor performs: dispatching a copy of the final query instruction (replica instruction) to each segment (data storage node).
  • the assigned executor on each segment acts as both the sender and receiver of data. Receiver. For example, the Redistribute(T2.b) instance running on segment S sends tuples on S to other segments based on the hash value of T2.b, and also receives tuples from other Redistribute(T2.b) instances on other segments.
  • Redistribute(T2.b) indicates the b-th column of table T2.
  • the input of the query optimizer in Figure 2a is a data query expression (EL query).
  • the output of the query optimizer is a data query plan (EL plan).
  • EL plan data query plan
  • the query optimizer builds a data query plan, it can query the database management system to obtain metadata information (such as table definitions).
  • the query optimizer abstracts metadata access details by allowing the database management system to register a metadata provider, which is responsible for serializing the metadata information into an EL query and sending it to the query optimizer. Metadata information can also be queried from a file containing metadata objects serialized in the EL language format.
  • the database management system needs to include a converter that consumes/emit data in EL format.
  • the query converter converts the syntax tree into a data query expression, while the plan converter converts the data query plan into an executable plan. The implementation of this plan converter is completed entirely outside the optimizer, which allows multiple database management systems to use the query optimizer by providing appropriate plan converters.
  • the query optimizer architecture can be highly extensible, and all components in the query optimizer and the database management system can be replaced and configured separately.
  • the query optimizer specifically includes:
  • Memo used to encode the data query plan constructed by the query optimizer in a compact memory data structure and store it in the memo;
  • the structure of the memo consists of a set of containers called expression groups, where each expression group contains logically equivalent expressions.
  • the memo expression group captures different subquery expressions of a data query expression (such as a filter on a table or a table join of two tables), and the group members of the expression group (base subexpressions and equivalent subexpressions of the base subexpressions) implement the same data query in different logical ways (such as different join orders).
  • the base subexpressions or equivalent subexpressions of the base subexpressions within each expression group are constructed through query operators.
  • the recursive structure of the memo allows compact encoding of huge data to build a data query plan space.
  • a searcher used for using a search mechanism to construct an initial subquery plan through the sub-expressions to be constructed in the expression group, and estimating a subquery plan to be processed with the minimum query cost from the initial subquery plan;
  • a task scheduler is used to start the search mechanism, through which parallel work units are created to perform the three main steps of query optimization: exploration (generating equivalent logical expressions), realization (building initial subquery plans), and optimization (optimizing initial subquery plans), where the required physical properties (such as sort order) are enforced;
  • optimization tools are used to provide search mechanisms for search engines.
  • the optimization tools include:
  • a transformation component used for determining an equivalent sub-expression, wherein the equivalent sub-expression and the basic sub-expression have equivalent expression meanings;
  • plan alternatives are generated by applying transformation rules, which can produce equivalent logical expressions (e.g. Inner Join(T1, T2) ⁇ Inner Join(T2, T1)) or physical realizations of existing sub-expressions (e.g. Join(T1, T2) ⁇ Hash Join(T2, T1)).
  • transformation rules can produce equivalent logical expressions (e.g. Inner Join(T1, T2) ⁇ Inner Join(T2, T1)) or physical realizations of existing sub-expressions (e.g. Join(T1, T2) ⁇ Hash Join(T2, T1)).
  • transformation rules can produce equivalent logical expressions (e.g. Inner Join(T1, T2) ⁇ Inner Join(T2, T1)) or physical realizations of existing sub-expressions (e.g. Join(T1, T2) ⁇ Hash Join(T2, T1)).
  • the results of applying transformation rules are copied to the memo, which may result in the creation of new expression groups and/or the addition of new expressions to existing expression groups.
  • Each transformation rule is an independent component that can be explicitly activated
  • Hash Join is used to combine the rows in two or more tables according to specific conditions to generate a larger new table that contains all the information in the source table.
  • Hash Join is to join the two tables through hash.
  • the (Hash) function is mapped to several buckets respectively, and then a Join operation is performed on each bucket, and finally the Join results are merged.
  • Enforcement attributes used to insert the physical attributes required for enforcement into the initial subquery plan
  • the query optimizer includes an extensible framework for describing query requirements and plan characteristics based on formal attribute specifications. Attributes are of different types, including logical attributes (such as output columns), physical attributes (such as sort order and data distribution), and scalar attributes (such as columns used in input conditions). When the query optimizer builds a data query plan, each query operator may request specific attributes from the sub-expression to be built.
  • the optimized initial sub-query plan may automatically meet the current query optimization request associated with the sub-expression to be built (the required attributes (such as the index scan (IndexScan) plan provides sorted data)), but many times it is necessary to insert some attributes into the initial sub-query plan through enforcement attributes (for example, the initial sub-query plan contains a Sort sorting operation, and the Sort operator needs to be inserted into the plan).
  • the framework allows each query operator to control the settings of the enforcer based on the attributes of the sub-query plan and the local behavior of the query operator, so as to ensure that the query data meets specific requirements, such as the order of output columns, the distribution of data, etc.
  • Metadata cache used to cache metadata information of data to be queried
  • the query cost estimation is used to estimate the query cost of the initial sub-query plan constructed by the searcher through the metadata information in the metadata cache, and obtain the query cost of the initial sub-query plan.
  • the query system caches metadata information on the query optimizer side and only queries metadata information from the directory when the metadata information is not found in the cache or has changed.
  • the query language input by the user is converted into Structured Query Language (SQL) (i.e., data query command), where SQL is a standard computer language for processing relational databases.
  • SQL Structured Query Language
  • the following example illustrates how the query optimizer constructs a data query plan:
  • the data query command is converted into a data query expression in EL format.
  • the data query command is converted into EL format, which contains the required output columns, sorting columns, data distribution, and query logic in XML format.
  • Metadata information (such as table and operator definitions) is decorated as a metadata identifier (ID) to allow the query optimizer to request more information when building a data query plan.
  • ID is a unique identifier consisting of a database system identifier, an object identifier, and a version number.
  • the data query expression in EL format is sent to the query optimizer, parsed and converted into an in-memory logical expression tree, and then copied to the memo.
  • Expression group 0 (Group0) is called the root group because it corresponds to the root of the logical plan.
  • the dependencies between operators in the logical plan are captured as references between expression groups, that is, the basic sub-expressions corresponding to Inner Join belong to the root group, and the basic sub-expressions corresponding to tables T1 and T2 belong to sub-groups.
  • Inner Join [T1, T2] means that expression group 1 (Group1) and expression group 2 (Group2) are connected as sub-groups.
  • Query optimization is performed in the following steps:
  • Inner Join[T1,T2] is generated as Inner Join[T2,T1].
  • the newly generated equivalent expression is added to the existing root group and may create a new expression group.
  • the memo structure has a built-in duplicate detection mechanism based on the expression topology to detect and eliminate any duplicate expressions created by different transformations.
  • the memo maintains the complete logical space of a given query.
  • the derived statistics mechanism of the query optimizer is then triggered to calculate the statistical objects of the memo group.
  • the statistical objects are mainly used to calculate the column histograms of cardinality and data skewness (i.e., metadata information of the data to be queried).
  • Derived statistics are performed on a compact memo structure to avoid expanding the search space. The system selects the expression with the highest reliable statistics to calculate the derived statistical data for the group of sub-expressions to be constructed, and the statistical calculation is based on the sub-expressions to be constructed.
  • an Inner Join expression (base subexpression) with a few join conditions (first number of conditions) has a smaller query cost than another equivalent Inner Join expression (equivalent subexpression) with more join conditions (second number of conditions) (this may occur when generating multiple join orders). Because the greater the number of join conditions, the higher the propagation and amplification errors may be. Calculating the confidence score of the cardinality is very challenging because the confidence scores need to be aggregated on all nodes of a given subexpression to be constructed. After selecting the subexpression to be constructed with the smallest query cost in the root group, the system recursively triggers derived statistics on the subgroups of the subexpression to be constructed. Finally, by combining the statistical objects of the subgroups, the statistical object of the root group (metadata information of the data to be queried) is constructed.
  • the derived statistics mechanism of the running example is shown.
  • a top-down traversal is performed, where the sub-expression of the root group requests statistics (metadata information of column a in table T1 (T1.a) and metadata information of column b in table T2 (T2.b)) from the sub-expressions of its sub-groups (Group1 and Group2).
  • the requested histograms are loaded on demand from the catalog through the metadata provider, parsed into EL queries (data query expressions) and stored in the metadata cache to serve future requests.
  • a bottom-up traversal is performed to combine the sub-statistics objects into the statistics object of the root group (metadata information of the data to be queried). Because the join condition may affect the column histogram, this combines the (possibly modified) histograms of T1.a and T2.b.
  • the constructed statistics objects are attached to a single group and can be incrementally updated during optimization (for example, by adding new histograms). This is crucial to keep the cost of derived statistics manageable.
  • Optimization begins by submitting a query optimization request to the root expression group of the memo, specifying query requirements such as result distribution and sort order. Submitting a request to the root expression group is equivalent to requesting the minimum query cost plan that satisfies the request among the physical operators in the root expression group.
  • the sub-expressions in the root expression group pass the corresponding request to the sub-expressions in the sub-group according to the incoming query optimization request and the local requirements of the operator.
  • Multiple identical query optimization requests may be submitted by the same group during optimization.
  • the system caches query optimization requests in a hash table. Incoming requests are only calculated if there is no query optimization request in the hash table (to build a sub-query plan through the query optimization request).
  • each sub-expression maintains a local hash table to map query optimization requests to corresponding historical query requests.
  • the local hash table provides a plan call link used when extracting historical query plans from the memo.
  • the query optimization request in the running example is shown.
  • the current query optimization request associated with the first expression specifies that the query results need to be collected on the master node based on the order of column a of table T1.
  • Figure 2e also includes the group hash table corresponding to the best group expression, as well as the Sort, Redistribute, and Replicate operators inserted in the memo.
  • the Gather operator collects data from all data storage nodes to the master node.
  • the GatherMerge operator collects sorted data from all data storage nodes to the master node and maintains the sorted order.
  • the Redistribute operator distributes tuples to segments based on the hash value of the given parameter.
  • FIG2e includes the query operators in Group0 (Nested Loop Join [Group1, Group2] (Inner Nest Loop Join [Group1, Group2]), Inner Nest Loop Join [Group2, Group1], Inner Hash Join [Group1, Group2] and Inner Hash Join [Group2, Group1, (Sort (T1.a), Gather, GatherMerge (T1.a)), the query operators in Group1 (Scan (T1), Sort (T1.a), Replicate), the query operators in Group2 (Scan (T2), Replicate, Redistribute (T2.b)), and according to the query optimization request, the data query plan on the right is constructed by the query operators in Group1, Group2 and Group0. As shown in FIG2f, the optimization of Inner Hash Join [T1, T2] is shown.
  • one of the alternatives is to align the distribution of child nodes according to the join condition, so that the nodes to be connected
  • the data can be co-located. This is achieved by requesting a Hashed(T1.a) distribution from the first expression 1 and a Hashed(T2.b) distribution from the first expression 2.
  • Both first expressions require any type of sort order to be provided.
  • the Inner Hash Join After finding the best plan for the subgroups (Group1 and Group2) (first subquery plan), the Inner Hash Join combines the sub-attributes to determine the provided distribution and sort order (final optimization request).
  • the best plan for the first expression 2 requires hash distribution of T2 on T2.b, because T2 was originally hashed on T2.a.
  • the best plan for the first expression 1 is a simple scan because T1 is already hash distributed on T1.a.
  • the unsatisfied attributes must be enforced (final optimization request).
  • the attribute enforcement in the system adopts a flexible framework that allows each operator, based on subplans and operator local behavior, to define the behavior of enforcing the required attributes. For example, the order-preserving Nested Loops Join (NL Join) operator may not need to enforce a sort order on the join if the outer child nodes already provide a sort order.
  • Figure 2f shows two possible initial subquery plans of Group0 that satisfy the query optimization request through attribute execution.
  • the data query plan on the left sorts the query data on the segment, and then collects and merges the sorted results on the main node.
  • the data query plan on the right collects the query data from the segment to the main node and then sorts it.
  • These different alternatives are encoded in the memo, and the cost model calculates the query cost (computational amount) of the initial subquery plan based on the metadata information of the data to be queried.
  • the data query plan is extracted from the memo.
  • Figure 2f illustrates the plan extraction of the running example. The relevant expressions have corresponding local hash tables.
  • Each local hash table maps the incoming optimization request to the corresponding sub-optimization request.
  • the sub-expression to find the best group expression is the GatherMerge operator.
  • the best group expression to find is Sort. Therefore, the GatherMerge operator is linked to Sort.
  • the corresponding best group expression is Inner Hash Join [T1, T2]. Therefore, Sort is linked to Inner Hash Join.
  • the same process is followed to complete the plan extraction, and the final data query plan is shown in Figure 2f.
  • the extracted best data query plan is serialized in EL format and sent to the database management system for execution.
  • This application is mainly aimed at important business scenarios such as data analysis, such as large-scale real-time data and offline data query systems.
  • the Internet generates a huge amount of data every day, which brings great challenges to the big data query system.
  • the data management and query systems in the business have made great progress in scalability, availability and processing performance, so that large data sets of hundreds of trillions of bytes (TeraByte, TB) or even petabytes (PetaByte, PB) can be analyzed and queried more quickly through structured query language (Structured Query Language, SQL) or SQL-like interfaces.
  • SQL Structured Query Language
  • Internet data is usually stored in a distributed computing framework (Hadoop), and the query engine compiles the initial SQL query into Spark or MapReduce jobs, where Spark is a distributed computing framework that uses memory computing, and MapReduce is a distributed computing framework that uses disk read and write IO.
  • Previous query optimizers usually use multi-stage query optimization, and the performance of compiled Spark or MapReduce jobs is poor. For this reason, this application designs a query plan construction method based on the waterfall tree model to accelerate the encoding of complex queries. After actual data testing, it can bring more than several times the query acceleration compared to previous query optimizers.
  • Hive For SQL queries on big data systems, a common solution in the industry is to use Hive to convert queries into MapReduce tasks, where Hive is a data warehouse tool based on Hadoop, and Hadoop is a distributed computing framework, but this method may lead to poor interactive analysis performance.
  • the industry has developed a number of professional query engines. However, these methods are often only applicable to specific host systems and are optimized, but cannot support the query requirements of multiple distributed systems at the same time. Since the query optimizer is the main influencing factor in analyzing query processing performance, and there are a large number of complex query processing requirements for big data in the business, this application designs a new type of query optimizer for a distributed query architecture for big data, and the business can quickly develop new optimization technologies and advanced query functions based on this optimizer. This application accelerates queries by designing metadata cache components, memos, mandatory attribute optimization and other technologies, and uses an efficient multi-core scheduler to increase the optimization speed.
  • the present application designs a new query optimizer.
  • a query optimizer based on a waterfall model optimization framework.
  • the query optimizer of the previous data query system was tightly coupled with the entire database management system, but a unique feature of the present application is that the designed optimizer can be run as a separate component outside the database system. This capability makes it possible for products with different computing architectures to use the same optimizer together.
  • the optimizer can be deployed and run as an independent product, and can be tested and optimized separately in detail without being coupled to the structure of the database system. This reduces the difficulty of deployment and testing.
  • this application facilitates the construction of data query plans and improves query efficiency.
  • the embodiment of the present application also provides a query plan construction device, which can be integrated in an electronic device, and the electronic device can be a terminal, a server, etc.
  • the terminal can be a mobile phone, a tablet computer, a smart Bluetooth device, a laptop, a personal computer, etc.
  • the server can be a single server or a server cluster composed of multiple servers.
  • the method of the embodiment of the present application will be described in detail by taking the query plan construction device specifically integrated in an electronic device as an example.
  • the query plan construction device may include an acquisition unit 310, a key unit 320, an equivalent unit 330, a determination unit 340, and a construction unit 350, as follows:
  • the acquisition unit 310 is used to acquire a data query expression, where the data query expression includes keywords.
  • obtaining a data query expression includes:
  • Get data query command the data query command includes keywords
  • the data query command is parsed and processed to obtain the syntax tree of the data query command
  • the preset data exchange language is used to perform format conversion on the syntax tree to obtain a data query expression.
  • the key unit 320 is used to obtain the basic sub-expressions in the data query expression and the association relationship between the basic sub-expressions according to the keywords.
  • the equivalent unit 330 is used to expand the basic sub-expression to obtain an equivalent sub-expression of the basic sub-expression.
  • the basic sub-expression and the equivalent sub-expression of the basic sub-expression have equivalent expression meanings.
  • the determining unit 340 is used to determine the sub-expression to be constructed from the basic sub-expression and the equivalent sub-expression of the basic sub-expression.
  • determining a sub-expression to be constructed from a basic sub-expression and an equivalent sub-expression of the basic sub-expression includes:
  • the first condition number is the condition number of the data connection condition associated with the base sub-expression
  • the second condition number is the condition number of the data connection condition associated with the equivalent sub-expression of the base sub-expression
  • a sub-expression to be constructed is determined from a basic sub-expression and an equivalent sub-expression of the basic sub-expression.
  • determining a sub-expression to be constructed from a basic sub-expression and an equivalent sub-expression of the basic sub-expression includes:
  • the sub-expression to be constructed is determined from the basic sub-expression and the equivalent sub-expression of the basic sub-expression.
  • the construction unit 350 is used to construct a data query plan through the sub-expressions to be constructed and the association relationships.
  • constructing a data query plan using sub-expressions to be constructed and associated relationships includes:
  • the sub-query plans to be processed are merged to obtain a data query plan.
  • constructing an initial subquery plan using the sub-expression to be constructed includes:
  • connection relationship between the query operators is constructed
  • the query operators are controlled to be arranged according to the connection relationship to obtain the initial subquery plan.
  • determining a to-be-processed sub-query plan from an initial sub-query plan includes:
  • the subquery plan to be processed is determined from the initial subquery plan.
  • the sub-expression to be constructed includes a first expression and a second expression, the first expression is used to query the data to be queried, and the second expression is used to filter the data to be queried, and according to the current query optimization request and the association relationship, the sub-query plan to be processed is determined from the initial sub-query plan, including:
  • the unsatisfied requests and the association relationship determine a final optimization request from the current query optimization requests associated with the second expression, where the final optimization request includes the unsatisfied request;
  • a second subquery plan is determined from the second initial subquery plan, where the second initial subquery plan is an initial subquery plan constructed by using the second expression, and the subquery plans to be processed include the first subquery plan and the second subquery plan.
  • it also includes:
  • the historical query plan corresponding to the pending historical request is used as the pending sub-query plan
  • the step of constructing an initial subquery plan through the sub-expression to be constructed is triggered.
  • obtaining a historical query optimization request associated with the sub-expression to be constructed and a historical query plan corresponding to the historical query optimization request includes:
  • the historical query plan corresponding to the pending historical request is used as the pending sub-query plan, including:
  • plan retrieval link associated with the pending historical request is used to retrieve the historical query plan corresponding to the historical query optimization request as the pending sub-query plan.
  • the method further includes:
  • the replica instruction is sent to multiple data storage nodes of the database, so that the data storage nodes return the query data according to the replica instruction.
  • the above units can be implemented as independent entities, or can be arbitrarily combined to be implemented as the same or several entities.
  • the specific implementation of the above units can refer to the previous method embodiments, which will not be repeated here.
  • the query plan construction device of this embodiment obtains a data query expression by an acquisition unit, and the data query expression includes keywords; the key unit obtains the current sub-expression in the data query expression and the association relationship between the current sub-expressions according to the keywords; the equivalent unit expands the basic sub-expression to obtain the equivalent sub-expression of the basic sub-expression, and the basic sub-expression and the equivalent sub-expression of the basic sub-expression have equivalent expression meanings; the determination unit determines the sub-expression to be constructed from the basic sub-expression and the equivalent sub-expression of the basic sub-expression; the construction unit constructs a data query plan through the sub-expression to be constructed and the association relationship. Therefore, the embodiment of the present application can facilitate the construction of a data query plan and improve query efficiency.
  • the embodiment of the present application also provides an electronic device, which can be a terminal, a server, etc.
  • the terminal can be a mobile phone, a tablet computer, a smart Bluetooth device, a laptop, a personal computer, etc.
  • the server can be a single server or a server cluster composed of multiple servers, etc.
  • the query plan construction device can also be integrated into multiple electronic devices.
  • the query plan construction device can be integrated into multiple servers, and the query plan construction method of the present application can be implemented by multiple servers.
  • the electronic device of this embodiment is a server as an example for detailed description.
  • FIG. 4 it shows a schematic diagram of the structure of the server involved in the embodiment of the present application. Specifically:
  • the server may include one or more processing core processors 410, one or more computer-readable storage media memories 420, a power supply 430, an input module 440, and a communication module 450.
  • processing core processors 410 one or more computer-readable storage media memories 420
  • power supply 430 an input module 440
  • communication module 450 a communication module 450
  • the processor 410 is the control center of the server, and uses various interfaces and lines to connect various parts of the entire server. It executes various functions of the server and processes data by running or executing software programs and/or modules stored in the memory 420, and calling data stored in the memory 420.
  • the processor 410 may include one or more processing cores; in some embodiments, the processor 410 may integrate an application processor and a modem processor, wherein: The application processor mainly processes the operating system, user interface, and application programs, and the modem processor mainly processes wireless communications. It is understandable that the modem processor may not be integrated into the processor 410 .
  • the memory 420 can be used to store software programs and modules.
  • the processor 410 executes various functional applications and data processing by running the software programs and modules stored in the memory 420.
  • the memory 420 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application required for at least one function (such as a sound playback function, an image playback function, etc.), etc.; the data storage area may store data created according to the use of the server, etc.
  • the memory 420 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one disk storage device, a flash memory device, or other volatile solid-state storage devices. Accordingly, the memory 420 may also include a memory controller to provide the processor 410 with access to the memory 420.
  • the server also includes a power supply 430 for supplying power to various components.
  • the power supply 430 can be logically connected to the processor 410 through a power management system, so that the power management system can manage charging, discharging, and power consumption.
  • the power supply 430 can also include any components such as one or more DC or AC power supplies, recharging systems, power failure detection circuits, power converters or inverters, and power status indicators.
  • the server may further include an input module 440, which may be used to receive input digital or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function controls.
  • an input module 440 which may be used to receive input digital or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function controls.
  • the server may further include a communication module 450.
  • the communication module 450 may include a wireless module.
  • the server may perform short-range wireless transmission through the wireless module of the communication module 450, thereby providing wireless broadband Internet access to the user.
  • the communication module 450 may be used to help the user send and receive emails, browse web pages, and access streaming media.
  • the server may further include a display unit, etc., which will not be described in detail herein.
  • the processor 410 in the server will load the executable files corresponding to the processes of one or more applications into the memory 420 according to the following instructions, and the processor 410 will run the application stored in the memory 420, thereby implementing any of the methods provided in the aforementioned embodiments.
  • multiple sub-expressions to be constructed can construct a query plan based on the association relationship, so that there is no conflict between the multiple sub-expressions to be constructed when constructing the data query plan. This makes it easier to construct the data query plan and improves the query performance.
  • an embodiment of the present application provides a computer-readable storage medium, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps in any query plan construction method provided in the embodiment of the present application.
  • the instructions can execute any method provided in the aforementioned embodiment.
  • the storage medium may include: read-only memory (ROM), random access memory (RAM), disk or CD, etc.
  • a computer program product comprising a computer program/instructions stored in a computer readable storage medium.
  • the machine-readable storage medium reads the computer program/instructions, and the processor executes the computer program/instructions, so that the computer device executes the methods provided in various optional implementations of the query plan construction provided in the above embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Operations Research (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application discloses a query plan construction method and apparatus, an electronic device and a storage medium. The method of embodiments of the present application comprises: acquiring a data query expression, the data query expression comprising a keyword; obtaining basic sub-expressions in the data query expression and an association relationship between the basic sub-expressions according to the keyword; expanding the basic sub-expressions to obtain equivalent sub-expressions of the basic sub-expressions, wherein an equivalent sub-expression and a corresponding basic sub-expression have an equivalent expression meaning; from the basic sub-expressions and the equivalent sub-expressions of the basic sub-expressions, determining sub-expressions to be constructed; and constructing a data query plan by means of the sub-expressions to be constructed and the association relationship. In the present application, the data query plan can be constructed according to the association relationship by a plurality of sub-expressions to be constructed, so that the plurality of sub-expressions to be constructed do not conflict with each other during construction of the data query plan, thereby facilitating the construction of the data query plan, and improving the query performance.

Description

查询计划构建方法、装置、电子设备和存储介质Query plan construction method, device, electronic device and storage medium

本申请要求于2023年05月19日提交中国专利局、申请号202310575987.9、申请名称为“查询计划构建方法、装置、电子设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed with the China Patent Office on May 19, 2023, with application number 202310575987.9 and application name “Query plan construction method, device, electronic device and storage medium”, all contents of which are incorporated by reference in this application.

技术领域Technical Field

本申请涉及计算机领域,具体涉及一种查询计划构建方法、装置、电子设备和存储介质。The present application relates to the field of computers, and in particular to a query plan construction method, device, electronic device and storage medium.

背景技术Background Art

数据库管理系统是一种用于管理和组织数据的软件系统,可以帮助用户创建、管理、存储和检索数据。在检索数据时,数据库管理系统可以通过查询优化技术扩展数据查询,并通过扩展后的数据查询构建查询计划,再依据查询计划查询数据。A database management system is a software system used to manage and organize data. It can help users create, manage, store and retrieve data. When retrieving data, the database management system can expand data queries through query optimization technology, build query plans based on the expanded data queries, and then query data based on the query plans.

然而,目前查询计划中的操作步骤可能存在相互冲突,导致构建的查询计划查询得到的数据可能与预期不一致,影响查询效率。However, the operation steps in the current query plan may conflict with each other, resulting in the data obtained by the query plan constructed being inconsistent with expectations, which affects query efficiency.

发明内容Summary of the invention

本申请实施例提供一种查询计划构建方法、装置、电子设备和存储介质,可以便于构建数据查询计划,提升查询效率。The embodiments of the present application provide a query plan construction method, device, electronic device and storage medium, which can facilitate the construction of data query plans and improve query efficiency.

本申请实施例提供一种查询计划构建方法,该方法由电子设备执行,包括:The embodiment of the present application provides a query plan construction method, which is executed by an electronic device and includes:

获取数据查询表达式,数据查询表达式包括关键词;Get data query expressions, which include keywords;

根据关键词,得到数据查询表达式中的基础子表达式,以及基础子表达式之间的关联关系;According to the keywords, basic sub-expressions in the data query expression and the association relationship between the basic sub-expressions are obtained;

对基础子表达式进行扩展,得到基础子表达式的等效子表达式,基础子表达式和基础子表达式的等效子表达式具有等效的表达含义;Expanding the basic sub-expression to obtain an equivalent sub-expression of the basic sub-expression, the basic sub-expression and the equivalent sub-expression of the basic sub-expression have equivalent expression meanings;

从基础子表达式和基础子表达式的等效子表达式中确定待构建子表达式;Determine the sub-expression to be constructed from the basic sub-expression and the equivalent sub-expression of the basic sub-expression;

通过待构建子表达式和关联关系构建数据查询计划。Construct a data query plan based on the sub-expressions to be constructed and the associated relationships.

本申请实施例还提供一种查询计划构建装置,该装置部署在电子设备上,包括:The embodiment of the present application further provides a query plan construction device, which is deployed on an electronic device and includes:

获取单元,用于获取数据查询表达式,数据查询表达式包括关键词;An acquisition unit, used for acquiring a data query expression, where the data query expression includes keywords;

关键单元,用于根据关键词,得到数据查询表达式中的基础子表达式,以及基础子表达式之间的关联关系;The key unit is used to obtain the basic sub-expressions in the data query expression and the association relationship between the basic sub-expressions according to the keywords;

等效单元,用于对基础子表达式进行扩展,得到基础子表达式的等效子表达式,基础子表达式和基础子表达式的等效子表达式具有等效的表达含义;The equivalent unit is used to expand the basic sub-expression to obtain an equivalent sub-expression of the basic sub-expression. The basic sub-expression and the equivalent sub-expression of the basic sub-expression have equivalent expression meanings;

确定单元,用于从基础子表达式和基础子表达式的等效子表达式中确定待构建子表达式;A determination unit, used for determining a sub-expression to be constructed from a basic sub-expression and an equivalent sub-expression of the basic sub-expression;

构建单元,用于通过待构建子表达式和关联关系构建数据查询计划。A construction unit is used to construct a data query plan through sub-expressions to be constructed and associated relationships.

本申请实施例还提供一种电子设备,包括存储器存储有多条指令;所述处理器从所述存储器中加载指令,以执行本申请实施例所提供的任一种查询计划构建方法中的步骤。 An embodiment of the present application also provides an electronic device, including a memory storing multiple instructions; the processor loads instructions from the memory to execute the steps in any query plan construction method provided in the embodiment of the present application.

本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质存储有多条指令,所述指令适于处理器进行加载,以执行本申请实施例所提供的任一种查询计划构建方法中的步骤。An embodiment of the present application also provides a computer-readable storage medium, which stores a plurality of instructions, and the instructions are suitable for a processor to load to execute the steps in any query plan construction method provided in the embodiment of the present application.

本申请实施例还提供一种计算机程序产品,包括计算机程序/指令,所述计算机程序/指令被处理器执行时实现本申请实施例所提供的任一种查询计划构建方法中的步骤。An embodiment of the present application also provides a computer program product, including a computer program/instruction, which, when executed by a processor, implements the steps in any query plan construction method provided in the embodiment of the present application.

本申请实施例可以获取数据查询表达式,数据查询表达式包括关键词,同时数据查询表达式可以存在多个基础子表达式,每个基础子表达式之间存在关联关系,故可以根据关键词,得到数据查询表达式中的基础子表达式,以及基础子表达式之间的关联关系。然后对基础子表达式进行扩展,得到基础子表达式的等效子表达式,由于等效子表达式和基础子表达式具有等效的表达含义,如此,等效子表达式具有基础子表达式的关联关系,从基础子表达式和等效子表达式中可以筛选出待构建子表达式,待构建子表达式可以是基础子表达式或等效子表达式。接着通过待构建子表达式和关联关系构建数据查询计划,多个待构建子表达式可以依据关联关系构建数据查询计划,使多个待构建子表达式在构建数据查询计划时不存在相互冲突,如此,便于构建数据查询计划,提升了查询性能。The embodiment of the present application can obtain a data query expression, which includes keywords. At the same time, there can be multiple basic sub-expressions in the data query expression, and there is an association relationship between each basic sub-expression, so the basic sub-expressions in the data query expression and the association relationship between the basic sub-expressions can be obtained according to the keywords. Then the basic sub-expression is expanded to obtain the equivalent sub-expression of the basic sub-expression. Since the equivalent sub-expression and the basic sub-expression have equivalent expression meanings, the equivalent sub-expression has the association relationship of the basic sub-expression, and the sub-expression to be constructed can be screened out from the basic sub-expression and the equivalent sub-expression. The sub-expression to be constructed can be a basic sub-expression or an equivalent sub-expression. Then, a data query plan is constructed by the sub-expression to be constructed and the association relationship. Multiple sub-expressions to be constructed can construct a data query plan based on the association relationship, so that multiple sub-expressions to be constructed do not conflict with each other when constructing the data query plan. In this way, it is convenient to construct a data query plan and improve query performance.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required for use in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present application. For those skilled in the art, other drawings can be obtained based on these drawings without creative work.

图1a是本申请实施例提供的查询计划构建方法的场景示意图;FIG. 1a is a schematic diagram of a scenario of a query plan construction method provided in an embodiment of the present application;

图1b是本申请实施例提供的查询计划构建方法的流程示意图;FIG1b is a flow chart of a query plan construction method provided in an embodiment of the present application;

图2a是本申请实施例提供的查询系统的结构示意图;FIG2a is a schematic diagram of the structure of a query system provided in an embodiment of the present application;

图2b是本申请实施例提供的查询优化器的示意结构图;FIG2b is a schematic structural diagram of a query optimizer provided in an embodiment of the present application;

图2c是本申请实施例提供的创建子表达式的示意图;FIG2c is a schematic diagram of creating a sub-expression provided in an embodiment of the present application;

图2d是本申请实施例提供的获取元数据信息的示意图;FIG2d is a schematic diagram of obtaining metadata information provided by an embodiment of the present application;

图2e是本申请实施例提供的构建数据查询计划的结构示意图FIG. 2e is a schematic diagram of a structure for constructing a data query plan provided in an embodiment of the present application.

图2f是本申请实施例提供的构建数据查询计划的流程示意图图;FIG2f is a schematic diagram of a flow chart of constructing a data query plan provided in an embodiment of the present application;

图3是本申请实施例提供的查询计划构建装置的结构示意图;FIG3 is a schematic diagram of the structure of a query plan construction device provided in an embodiment of the present application;

图4是本申请实施例提供的服务器的结构示意图。FIG. 4 is a schematic diagram of the structure of a server provided in an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The following will be combined with the drawings in the embodiments of the present application to clearly and completely describe the technical solutions in the embodiments of the present application. Obviously, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those skilled in the art without creative work are within the scope of protection of this application.

本申请实施例提供一种查询计划构建方法、装置、电子设备和存储介质。Embodiments of the present application provide a query plan construction method, device, electronic device and storage medium.

其中,该查询计划构建装置具体可以集成在电子设备中,该电子设备可以为终端、服务器等设备。其中,终端可以为手机、平板电脑、智能蓝牙设备、笔记本电脑、或者个人电脑(Personal Computer,PC)等设备;服务器可以是单一服务器,也可以是由多个服务器组成的服务器集群。 The query plan construction device can be integrated into an electronic device, which can be a terminal, a server, or other devices. The terminal can be a mobile phone, a tablet computer, a smart Bluetooth device, a laptop, or a personal computer (PC), etc. The server can be a single server or a server cluster composed of multiple servers.

在一些实施例中,该查询计划构建装置还可以集成在多个电子设备中,比如,查询计划构建装置可以集成在多个服务器中,由多个服务器来实现本申请的查询计划构建方法。In some embodiments, the query plan construction device can also be integrated into multiple electronic devices. For example, the query plan construction device can be integrated into multiple servers, and the query plan construction method of the present application can be implemented by multiple servers.

在一些实施例中,服务器也可以以终端的形式来实现。In some embodiments, the server may also be implemented in the form of a terminal.

例如,参考图1a,该电子设备可以获取数据查询表达式,数据查询表达式包括关键词;根据关键词,得到数据查询表达式中的基础子表达式,以及基础子表达式之间的关联关系;对基础子表达式进行扩展,得到基础子表达式的等效子表达式,等效子表达式和基础子表达式具有等效的表达含义;从基础子表达式和基础子表达式的等效子表达式中确定待构建子表达式;通过待构建子表达式和关联关系构建数据查询计划。For example, referring to Figure 1a, the electronic device can obtain a data query expression, which includes keywords; based on the keywords, obtain basic sub-expressions in the data query expression, and the association relationship between the basic sub-expressions; expand the basic sub-expressions to obtain equivalent sub-expressions of the basic sub-expressions, and the equivalent sub-expressions and the basic sub-expressions have equivalent expression meanings; determine the sub-expressions to be constructed from the basic sub-expressions and the equivalent sub-expressions of the basic sub-expressions; and construct a data query plan through the sub-expressions to be constructed and the association relationships.

其中,数据查询表达式可以存在多个基础子表达式,每个基础子表达式之间存在关联关系,通过等效子表达式可以扩展基础子表达式,由于等效子表达式和基础子表达式具有等效的表达含义,如此,等效子表达式具有基础子表达式的关联关系,从基础子表达式和等效子表达式中可以筛选出待构建子表达式,待构建子表达式可以是基础子表达式或等效子表达式,多个待构建子表达式可以依据关联关系构建数据查询计划,使多个待构建子表达式在构建数据查询计划时不存在相互冲突,如此,便于构建数据查询计划,提升了查询性能。Among them, there can be multiple basic sub-expressions in the data query expression, and there is an association relationship between each basic sub-expression. The basic sub-expression can be extended through the equivalent sub-expression. Since the equivalent sub-expression and the basic sub-expression have equivalent expression meanings, the equivalent sub-expression has an association relationship with the basic sub-expression. The sub-expression to be constructed can be screened out from the basic sub-expression and the equivalent sub-expression. The sub-expression to be constructed can be a basic sub-expression or an equivalent sub-expression. Multiple sub-expressions to be constructed can construct a data query plan based on the association relationship, so that multiple sub-expressions to be constructed do not conflict with each other when constructing the data query plan. In this way, it is easy to construct a data query plan and improve the query performance.

以下分别进行详细说明。需说明的是,以下实施例的序号不作为对实施例优选顺序的限定。It should be noted that the serial numbers of the following embodiments are not intended to limit the preferred order of the embodiments.

在本实施例中,提供了一种查询计划构建方法,如图1b所示,该查询计划构建方法的具体流程可以如下:In this embodiment, a query plan construction method is provided, as shown in FIG1b , and the specific process of the query plan construction method can be as follows:

110、获取数据查询表达式,数据查询表达式包括关键词。110. Obtain a data query expression, where the data query expression includes keywords.

其中,数据查询表达式是一种用于从多个数据库管理系统中查询和过滤数据的语法结构。比如,数据查询表达式的语言可以是简单表达式语言(Expression Language,EL)、查询表达式语言(Query Domain Specific Language,Query DSL)、一种Java持久化标准的查询语言(Java Persistence API Query Language,JPQL)、春(Spring)框架中的一种表达式语言(Spring Expression Language,SpEL),等等。Among them, data query expression is a grammatical structure used to query and filter data from multiple database management systems. For example, the language of data query expression can be simple expression language (Expression Language, EL), query expression language (Query Domain Specific Language, Query DSL), a Java persistence standard query language (Java Persistence API Query Language, JPQL), an expression language in the Spring framework (Spring Expression Language, SpEL), etc.

关键词用于指定数据查询表达式中查询条件的特殊单词或字符,其中,数据查询表达式可以存在多个查询条件。比如,关键词可以包括查询(SELECT):用于从数据库中选择需要查询的数据列,来自(FROM)用于指定待查询数据的表,条件(WHERE)用于指定查询结果的过滤条件,排序(ORDER BY)用于对查询结果按照指定的列进行排序,等等。Keywords are special words or characters used to specify query conditions in data query expressions, where a data query expression can contain multiple query conditions. For example, keywords may include query (SELECT): used to select the data columns to be queried from the database, from (FROM) used to specify the table of data to be queried, condition (WHERE) used to specify the filtering conditions for query results, sort (ORDER BY) used to sort the query results according to the specified column, and so on.

可以理解的是,EL语言的数据查询表达式之所以可以与多个数据库管理系统交互,是因为它并不直接访问数据库,而是通过Java数据库连接(Java Database Connectivity,JDBC)来访问数据库,其中,JDBC是一种Java接口,用于在Java应用程序中执行SQL语句,并与多个数据库管理系统进行交互。如此,通过EL语言的数据查询表达式,可以轻松地将应用程序连接到多种类型和版本的数据库管理系统,从而获得更广泛的功能和互操作性。此外,EL语言的数据查询表达式还具有良好的可移植性和跨平台性,可以在多个平台和应用程序服务器上运行。It is understandable that the reason why the data query expression of EL language can interact with multiple database management systems is that it does not directly access the database, but accesses the database through Java Database Connectivity (JDBC), where JDBC is a Java interface used to execute SQL statements in Java applications and interact with multiple database management systems. In this way, through the data query expression of EL language, applications can be easily connected to multiple types and versions of database management systems, thereby obtaining a wider range of functions and interoperability. In addition, the data query expression of EL language also has good portability and cross-platform, and can run on multiple platforms and application servers.

数据查询表达式的获取方法可以是: The method for obtaining data query expressions can be:

(1)、若查询优化器在数据库管理系统外运行时,多个数据库管理系统可以使用一个查询优化器,在用户向任意一个数据库管理系统内输入数据查询命令后,任意一个数据库管理系统可以将数据查询命令转换成数据查询表达式,再将数据查询表达式发送至查询优化器;(1) If the query optimizer runs outside the database management system, multiple database management systems can use one query optimizer. After the user enters a data query command into any database management system, any database management system can convert the data query command into a data query expression and then send the data query expression to the query optimizer.

(2)、若查询优化器在数据库管理系统内运行时,在用户向数据库管理系统输入数据查询命令后,数据库管理系统可以将数据查询命令转换成数据查询表达式,再将数据查询表达式传送至查询优化器;(2) If the query optimizer is running in a database management system, after the user inputs a data query command into the database management system, the database management system can convert the data query command into a data query expression, and then transmit the data query expression to the query optimizer;

其中,查询优化器为提供查询优化技术的组件。Among them, the query optimizer is a component that provides query optimization technology.

在一些实施例中,考虑到数据库管理系统的类型可以包括关系型数据库管理系统、非关系型数据库管理系统、内存数据库管理系统、分布式数据库管理系统和数据仓库等,为了使查询优化器可以布置在多个不同的数据库管理系统中,获取数据查询表达式,包括:In some embodiments, considering that the types of database management systems may include relational database management systems, non-relational database management systems, in-memory database management systems, distributed database management systems, and data warehouses, etc., in order to enable the query optimizer to be deployed in multiple different database management systems, obtaining a data query expression includes:

获取数据查询命令,数据查询命令包括关键词;Get data query command, the data query command includes keywords;

根据关键词,对数据查询命令进行语法分析处理,得到数据查询命令的语法树;According to the keywords, the data query command is parsed and processed to obtain the syntax tree of the data query command;

采用预设数据交换语言,对语法树进行格式转换处理,得到数据查询表达式。The preset data exchange language is used to perform format conversion on the syntax tree to obtain a data query expression.

其中,数据查询命令为用户从数据库管理系统中查询数据时所输入的命令。比如,数据查询命令可以是用户输入的查询字段,还可以是用户的语音命令,还可以是用户编写的数据查询脚本运行时所输出的命令,等等。The data query command is a command input by a user when querying data from a database management system. For example, the data query command can be a query field input by a user, a voice command of a user, or a command output when a data query script written by a user is executed, etc.

关键词可以与数据查询命令中的查询条件对应。Keywords can correspond to query conditions in data query commands.

比如,数据查询命令为从表T1的第a列和表T2的第b列中筛选出相同的数据,如此,数据查询命令中的查询条件可以包括查询表T1的第a列、查询表T2的第b列以及从表T1的第a列和表T2的第b列中筛选出相同的数据,关键词可以用于指示查询表T1的第a列,还可以用于指示查询表T2的第b列,还可以用于指示从表T1的第a列和表T2的第b列中筛选出相同的数据。For example, the data query command is to filter out the same data from the ath column of table T1 and the bth column of table T2. In this way, the query conditions in the data query command may include querying the ath column of table T1, querying the bth column of table T2, and filtering out the same data from the ath column of table T1 and the bth column of table T2. The keyword can be used to indicate querying the ath column of table T1, to indicate querying the bth column of table T2, and to indicate filtering out the same data from the ath column of table T1 and the bth column of table T2.

语法树表示数据查询命令的查询逻辑,其中,语法树包括多个节点,代表不同的操作。The syntax tree represents the query logic of the data query command, wherein the syntax tree includes multiple nodes representing different operations.

比如,在语法树中,SELECT节点表示要查询数据的操作,列(column)1和column2节点等表示要查询的列,FROM节点表示要从哪些表中查询,table1和table2是这些表的名称,WHERE节点表示要对查询结果进行限制的条件,等等。For example, in the syntax tree, the SELECT node represents the operation of querying data, column 1 and column 2 nodes represent the columns to be queried, the FROM node represents which tables to query from, table1 and table2 are the names of these tables, the WHERE node represents the conditions to limit the query results, and so on.

预设数据交互语言可以使查询优化器与不同类型的数据库管理系统交互。比如,预设数据交互语言可以是EL语言、Query DSL语言、JPQL语言,等等。The preset data interaction language can enable the query optimizer to interact with different types of database management systems. For example, the preset data interaction language can be EL language, Query DSL language, JPQL language, etc.

120、根据关键词,得到数据查询表达式中的基础子表达式,以及基础子表达式之间的关联关系。120. According to the keywords, basic sub-expressions in the data query expression and the association relationship between the basic sub-expressions are obtained.

其中,基础子表达式用于实现数据查询表达式中关键词指示的查询条件。比如,基础子表达式可以用于从数据库管理系统中选择需要查询数据的表的数据列,还可以用于指定查询结果的过滤条件,还可以用于对查询结果按照指定的列进行排序,等等。The basic sub-expression is used to implement the query conditions indicated by the keywords in the data query expression. For example, the basic sub-expression can be used to select the data columns of the table to be queried from the database management system, to specify the filtering conditions of the query results, to sort the query results according to the specified columns, and so on.

关联关系用于指示基础子表达式执行的逻辑顺序。比如,关联关系可以包括基础子表达式之间的并列关系、递进关系、与关系(AND关系),等等,其中,AND关系用于将多个查询条件限定在同一个查询中,只有同时满足所有条件时,才能返回符合要求的数据。 The association relationship is used to indicate the logical order of execution of the basic sub-expressions. For example, the association relationship can include parallel relationships, progressive relationships, and AND relationships between basic sub-expressions, etc. Among them, the AND relationship is used to limit multiple query conditions to the same query. Only when all conditions are met at the same time can the data that meets the requirements be returned.

比如,若数据查询表达式为“SELECT T1.a FROM T1,T2 WHERE T1.a=T2.b ORDER BY T1.a”时,该数据查询表达式表示利用了表连接的方式,连接了表T1和表T2,连接条件是T1.a=T2.b,意味着只有在表T1中的a列的值等于表T2中的b列的值才会被返回,即通过连接条件是T1.a=T2.b过滤出符合查询条件的数据,并按照T1.a列的值进行排序。For example, if the data query expression is "SELECT T1.a FROM T1,T2 WHERE T1.a=T2.b ORDER BY T1.a", the data query expression indicates that the table join method is used to connect table T1 and table T2. The join condition is T1.a=T2.b, which means that only the value of column a in table T1 is equal to the value of column b in table T2 will be returned. That is, the data that meets the query conditions is filtered out through the join condition T1.a=T2.b, and the data is sorted according to the value of column T1.a.

通过数据查询表达式中的“SELECT T1.a”、“FROM T1,T2”、“WHERE T1.a=T2.b”、“ORDER BY T1.a”,可以得到基础子表达式1、基础子表达式2和基础子表达式3,基础子表达式1用于检索表T1的第a列数据,基础子表达式2用于检索表T2的第b列数据,基础子表达式3用于指示表T1和表T2之间的内部连接(Inner Join(T1.a=T2.b)),用于过滤基础子表达式1的查询结果和基础子表达式2的查询结果。Through "SELECT T1.a", "FROM T1, T2", "WHERE T1.a=T2.b", "ORDER BY T1.a" in the data query expression, we can get basic sub-expression 1, basic sub-expression 2 and basic sub-expression 3. Basic sub-expression 1 is used to retrieve the a-th column data of table T1, basic sub-expression 2 is used to retrieve the b-th column data of table T2, and basic sub-expression 3 is used to indicate the inner join between table T1 and table T2 (Inner Join(T1.a=T2.b)), which is used to filter the query results of basic sub-expression 1 and basic sub-expression 2.

其中,基础子表达式1和基础子表达式2之间存在并列关系,基础子表达式1和基础子表达式2分别与基础子表达式3之间存在AND关系,即在基础子表达式1和基础子表达式2分别获取查询结果后,可以通过基础子表达式3过滤基础子表达式1的查询结果和基础子表达式2的查询结果。Among them, there is a parallel relationship between basic sub-expression 1 and basic sub-expression 2, and there is an AND relationship between basic sub-expression 1 and basic sub-expression 2 and basic sub-expression 3 respectively, that is, after basic sub-expression 1 and basic sub-expression 2 obtain query results respectively, the query results of basic sub-expression 1 and the query results of basic sub-expression 2 can be filtered through basic sub-expression 3.

130、对基础子表达式进行扩展,得到基础子表达式的等效子表达式,等效子表达式和基础子表达式具有等效的表达含义。130. Expand the basic sub-expression to obtain an equivalent sub-expression of the basic sub-expression, and the equivalent sub-expression and the basic sub-expression have equivalent expression meanings.

等效子表达式可以用于替代基础子表达式,与基础子表达式具有相同的查询结果,但查询逻辑不同,其中,基础子表达式对应一个或多个等效子表达式。The equivalent sub-expression can be used to replace the basic sub-expression, and has the same query result as the basic sub-expression, but the query logic is different, wherein the basic sub-expression corresponds to one or more equivalent sub-expressions.

等效的表达含义用于限定等效子表达式和基础子表达式有相同的语义,使等效子表达式的查询结果和基础子表达式的查询结果相同。The equivalent expression meaning is used to limit the equivalent sub-expression and the basic sub-expression to have the same semantics, so that the query result of the equivalent sub-expression is the same as the query result of the basic sub-expression.

比如,基础子表达式为读取(Get)表T1的第a列数据的子表达式,则等效子表达式可以是扫描(Scan)表T1的第a列数据的子表达式,还可以是先Scan表T1的数据后,再重新分配(Redistribute)表T1中的第a列数据组合得到的子表达式,等等,如此,基础子表达式和等效子表达式都可以查询表T1中的第a列数据。For example, if the basic sub-expression is a sub-expression that reads (Get) the data in the a-th column of table T1, then the equivalent sub-expression can be a sub-expression that scans (Scan) the data in the a-th column of table T1, or it can be a sub-expression that first scans the data in table T1 and then redistributes (Redistribute) the data in the a-th column of table T1, and so on. In this way, both the basic sub-expression and the equivalent sub-expression can query the data in the a-th column of table T1.

在一些实施例中,为了便于获取等效子表达式,对基础子表达式进行扩展,得到基础子表达式的等效子表达式,包括:In some embodiments, in order to facilitate obtaining an equivalent sub-expression, the basic sub-expression is expanded to obtain an equivalent sub-expression of the basic sub-expression, including:

获取预设表达式集合,预设表达式集合包括多个预设表达式,多个预设表达式具有等效的表达含义;Obtaining a preset expression set, the preset expression set including multiple preset expressions, and the multiple preset expressions have equivalent expression meanings;

根据基础子表达式,从预设表达式集合中确定等效子表达式。Based on the base sub-expression, an equivalent sub-expression is determined from a preset set of expressions.

其中,预设表达式集合为多个具有等效表达含义的预设表达式的集合。The preset expression set is a set of multiple preset expressions with equivalent expression meanings.

比如,预设表达式集合中记录有多个不同查询逻辑的预设表达式,但通过预设表达式集合中每个预设表达式具有相同的表达含义,可以得到相同的查询结果,通过预设表达式集合,便于确定基础子表达式的等效子表达式。For example, a preset expression set records multiple preset expressions with different query logics, but each preset expression in the preset expression set has the same expression meaning and can obtain the same query result. The preset expression set makes it easy to determine equivalent sub-expressions of basic sub-expressions.

140、从基础子表达式和基础子表达式的等效子表达式中确定待构建子表达式。140. Determine the sub-expression to be constructed from the basic sub-expression and the equivalent sub-expression of the basic sub-expression.

其中,待构建子表达式为基础子表达式和基础子表达式的等效子表达式中查询成本满足预设条件的子表达式。比如,待构建子表达式可以是基础子表达式和等效子表达式中查询成本最小的子表达式,还可以是查询成本满足预设条件的任意一个子表达式,等等。 The sub-expression to be constructed is a sub-expression whose query cost satisfies a preset condition among the basic sub-expression and the equivalent sub-expression of the basic sub-expression. For example, the sub-expression to be constructed can be a sub-expression with the smallest query cost among the basic sub-expression and the equivalent sub-expression, or any sub-expression whose query cost satisfies the preset condition, and so on.

在一些实施例中,考虑到子表达式可以指示表连接,表连接用于组合两个或多个表之间的数据,以便查询跨多个表的信息,为了从基础子表达式和等效子表达式中获取查询成本小的子表达式,从基础子表达式和基础子表达式的等效子表达式中确定待构建子表达式,包括:In some embodiments, considering that a sub-expression may indicate a table join, which is used to combine data between two or more tables so as to query information across multiple tables, in order to obtain a sub-expression with a small query cost from a basic sub-expression and an equivalent sub-expression, a sub-expression to be constructed is determined from the basic sub-expression and the equivalent sub-expression of the basic sub-expression, including:

获取第一条件数量和第二条件数量,第一条件数量为基础子表达式关联的数据连接条件的条件数量,第二条件数量为基础子表达式的等效子表达式关联的数据连接条件的条件数量;Get a first condition number and a second condition number, the first condition number is the condition number of the data connection condition associated with the base sub-expression, and the second condition number is the condition number of the data connection condition associated with the equivalent sub-expression of the base sub-expression;

根据第一条件数量和第二条件数量,从基础子表达式和基础子表达式的等效子表达式中确定待构建子表达式。According to the first conditional quantity and the second conditional quantity, a sub-expression to be constructed is determined from a basic sub-expression and an equivalent sub-expression of the basic sub-expression.

其中,数据连接条件用于指定至少两个表之间的连接关系,以便对至少两个表中的数据进行组合,其中,数据连接条件可以使用比较运算符(=,<,>,<=,>=等)和逻辑运算符(AND、OR等)进行组合。The data connection condition is used to specify the connection relationship between at least two tables so as to combine the data in at least two tables, wherein the data connection condition can be combined using comparison operators (=, <, >, <=, >=, etc.) and logical operators (AND, OR, etc.).

可以理解的是,数据连接条件的数量可以是多条。比如,假设有两个表员工数据(employee)表和部门数据(department)表,它们分别包含员工和部门信息,使用内连接(Inner Join)连接了employee表和department表,并使用两个数据连接条件:员工部门标识(employee.department_id)=部门标识(department.id),以及员工所在城市(employee.city)=部门所在城市(department.city),如此,可以确保只返回相同城市并且在同一部门中的员工和部门记录,并获取所需的员工姓名、部门名称和地址信息。It is understandable that the number of data connection conditions can be multiple. For example, suppose there are two tables, employee data table and department data table, which contain employee and department information respectively. The employee table and the department table are connected using inner join, and two data connection conditions are used: employee department ID (employee.department_id) = department ID (department.id), and employee city (employee.city) = department city (department.city). In this way, it can be ensured that only employee and department records in the same city and in the same department are returned, and the required employee name, department name and address information are obtained.

第一条件数量为基础子表达式表示至少两个表之间的连接关系的条件数据。The first conditional number represents conditional data of a connection relationship between at least two tables as a base sub-expression.

第二条件数量为等效子表达式表示至少两个表之间的连接关系的条件数据。The second condition number is condition data of a connection relationship between at least two tables represented by an equivalent sub-expression.

比如,Inner Join通常指的是基于比较运算符(如等于号“=”)来连接两个表的操作,即根据两个表中相应的列值是否相等来判断是否需要将对应的行连接起来,因此,在Inner Join操作中,数据连接条件通常只有一个或少数几个。For example, Inner Join usually refers to an operation to connect two tables based on a comparison operator (such as the equal sign "="), that is, judging whether the corresponding rows need to be connected based on whether the corresponding column values in the two tables are equal. Therefore, in the Inner Join operation, there is usually only one or a few data connection conditions.

内部哈希连接(Inner Hash Join)是基于哈希表来实现表之间的连接,在处理大规模数据时性能表现优异。其主要步骤是将需要连接的两个表中的数据分别存储在内存中的哈希表中,并使用哈希函数将它们映射到对应的哈希桶中。然后,对于每个哈希桶中的数据,通过比较它们之间的连接条件来进行连接。Inner Hash Join is based on hash tables to connect tables, and has excellent performance when processing large-scale data. The main steps are to store the data in the two tables to be connected in the hash table in memory respectively, and use the hash function to map them to the corresponding hash buckets. Then, for the data in each hash bucket, the connection is performed by comparing the connection conditions between them.

由于Inner Hash Join需要建立哈希表等额外的数据结构,因此在实际情况中通常需要更多的内存和计算资源。Since Inner Hash Join needs to build additional data structures such as hash tables, it usually requires more memory and computing resources in actual situations.

为了根据第一条件数量和第二条件数量,从基础子表达式和基础子表达式的等效子表达式中确定待构建子表达式:To determine the sub-expression to be constructed from the basic sub-expression and the equivalent sub-expression of the basic sub-expression according to the first conditional quantity and the second conditional quantity:

可以在表中的数据量较小时,从第一条件数量和第二条件数量中筛选出数量小的条件数量,若数量小的条件数量为第一条件数量,则将基础子表达式作为待构建子表达式,若数量小的条件数量为第二条件数量,则将该第二条件数量对应的等效子表达式作为待构建子表达式。When the amount of data in the table is small, a smaller number of conditions can be screened out from the first number of conditions and the second number of conditions. If the smaller number of conditions is the first number of conditions, the basic sub-expression is used as the sub-expression to be constructed. If the smaller number of conditions is the second number of conditions, the equivalent sub-expression corresponding to the second number of conditions is used as the sub-expression to be constructed.

可以在表中记录有大规模数据时,从第一条件数量和第二条件数量中筛选出数量多的条件数量,若数量大的条件数量为第一条件数量,则将基础子表达式作为待构建子表达式, 若数量大的条件数量为第二条件数量,则将该第二条件数量对应的等效子表达式作为待构建子表达式,如此,可以高效的执行数据查询,以便处理大规模数据时可以提供较好的性能表现。When a large amount of data is recorded in the table, the condition quantity with a large number can be screened out from the first condition quantity and the second condition quantity. If the condition quantity with a large number is the first condition quantity, the basic sub-expression is used as the sub-expression to be constructed. If the larger number of conditions is the second number of conditions, the equivalent sub-expression corresponding to the second number of conditions is used as the sub-expression to be constructed. In this way, data query can be executed efficiently, so as to provide better performance when processing large-scale data.

在一些实施例中,考虑到查询同一数据集可以采用多个表达式,每个表达式的查询逻辑不同,从而每个表达式具有不同的查询成本,为了可以为了从基础子表达式和基础子表达式的等效子表达式中获取查询成本小的子表达式,从基础子表达式和基础子表达式的等效子表达式中确定待构建子表达式,包括:In some embodiments, considering that multiple expressions may be used to query the same data set, each expression has a different query logic, and thus each expression has a different query cost, in order to obtain a sub-expression with a small query cost from a basic sub-expression and an equivalent sub-expression of the basic sub-expression, determining a sub-expression to be constructed from the basic sub-expression and the equivalent sub-expression of the basic sub-expression includes:

获取基础子表达式关联的待查询数据的元数据信息;Obtain metadata information of the data to be queried associated with the basic sub-expression;

从元数据信息中获取待查询数据的数据量;Obtain the data volume of the data to be queried from the metadata information;

根据数据量,从基础子表达式和基础子表达式的等效子表达式中确定待构建子表达式。According to the amount of data, the sub-expression to be constructed is determined from the basic sub-expression and the equivalent sub-expression of the basic sub-expression.

其中,待查询数据为基础子表达式需要查询的表中的数据。比如基础子表达式用于查询表A,待查询数据为表A中的数据。The data to be queried is the data in the table that the basic sub-expression needs to query. For example, the basic sub-expression is used to query table A, and the data to be queried is the data in table A.

元数据信息为描述待查询数据的数据信息。比如,元数据信息用于描述待查询数据,包括数据的格式、规模、编码方式、访问协议等方面的信息。Metadata information is data information that describes the data to be queried. For example, metadata information is used to describe the data to be queried, including information such as the format, size, encoding method, and access protocol of the data.

数据量为用户需要查询的待查询数据的规模大小,通常以记录数或文件大小等形式表示。比如,通过元数据信息,可以获取待查询数据的规模。The data volume is the size of the data to be queried that the user needs to query, usually expressed in the form of number of records or file size, etc. For example, the size of the data to be queried can be obtained through metadata information.

比如,基础子表达式为Get(T1),基础子表达式的等效子表达式为Scan(T1),Get(T1)和Scan(T1)用于查询表T1中的待查询数据。For example, the basic sub-expression is Get(T1), and the equivalent sub-expression of the basic sub-expression is Scan(T1). Get(T1) and Scan(T1) are used to query the data to be queried in table T1.

使用基础子表达式为Get(T1)的场景可以包括:Scenarios where the basic sub-expression Get(T1) is used include:

1、需要获取指定行键的单个记录;1. You need to get a single record with a specified row key;

2、只需要查询少量的数据记录,如获取某个用户的基本信息等;2. Only a small amount of data records need to be queried, such as obtaining basic information of a user;

3、需要查询某行的多个版本记录。3. You need to query multiple versions of a row.

使用等效子表达式为Scan(T1)的场景可以包括:Scenarios where the equivalent subexpression Scan(T1) is used include:

1、需要查询大量的数据记录,如批量统计分析数据等;1. Need to query a large number of data records, such as batch statistical analysis data;

2、需要按照特定条件进行过滤或排序,如时间、数值等;2. Need to filter or sort according to specific conditions, such as time, value, etc.;

3、需要遍历整个表或指定范围内的多行数据,如数据备份、归档等。3. It is necessary to traverse the entire table or multiple rows of data in a specified range, such as data backup and archiving.

若数据量小,则可以采用基础子表达式Get(T1)作为待构建子表达式,若数据量大,则可以采用等效子表达式Scan(T1)作为待构建子表达式。If the amount of data is small, the basic sub-expression Get(T1) can be used as the sub-expression to be constructed. If the amount of data is large, the equivalent sub-expression Scan(T1) can be used as the sub-expression to be constructed.

150、通过待构建子表达式和关联关系构建数据查询计划。150. Build a data query plan through the sub-expressions to be built and the associated relationships.

其中,数据查询计划为待构建子表达式依据关联关系构建的可执行的数据查询方案。The data query plan is an executable data query solution constructed based on the association relationship of the sub-expressions to be constructed.

比如,若数据查询表达式中存在3个基础子表达式,基础子表达式1、基础子表达式2和基础子表达式3,基础子表达式1和基础子表达式2的关联关系可以是并列关系,基础子表达式1和基础子表达式2分别与基础子表达式3之间的关联关系可以是AND关系。For example, if there are three basic sub-expressions in the data query expression, basic sub-expression 1, basic sub-expression 2 and basic sub-expression 3, the association relationship between basic sub-expression 1 and basic sub-expression 2 can be a parallel relationship, and the association relationship between basic sub-expression 1 and basic sub-expression 2 and basic sub-expression 3 respectively can be an AND relationship.

其中,待构建子表达式可以是基础子表达式或基础子表达式的等效子表达式,即存在基础从子表达式1和基础从子表达式1的等效子表达式1中确定的待构建子表达式1,从基础子表达式2和基础子表达式2的等效子表达式2中确定的待构建子表达式2,从基础 子表达式3和基础子表达式3的等效子表达式3中确定的待构建子表达式3,每个待构建子表达式与其它待构建子表达式之间存在关联关系。The subexpression to be constructed can be a basic subexpression or an equivalent subexpression of a basic subexpression, that is, there is a subexpression to be constructed 1 determined from basic subexpression 1 and the equivalent subexpression 1 of basic subexpression 1, a subexpression to be constructed 2 determined from basic subexpression 2 and the equivalent subexpression 2 of basic subexpression 2, and a subexpression to be constructed 3 determined from basic subexpression 2 and the equivalent subexpression 2 of basic subexpression 2. The sub-expressions 3 to be constructed determined in the sub-expressions 3 and the equivalent sub-expressions 3 of the basic sub-expressions 3, each sub-expression to be constructed has an association relationship with other sub-expressions to be constructed.

根据等效子表达式和基础子表达式具有等效的表达含义,等效子表达式1与基础子表达式2或等效子表达2的关联关系为并列关系;等效子表达式1与基础子表达式3或等效子表达3的关联关系可以是AND关系;等效子表达式2与基础子表达式3或等效子表达3的关联关系可以是AND关系。According to the equivalent sub-expression and the basic sub-expression having equivalent expression meanings, the association relationship between equivalent sub-expression 1 and basic sub-expression 2 or equivalent sub-expression 2 is a parallel relationship; the association relationship between equivalent sub-expression 1 and basic sub-expression 3 or equivalent sub-expression 3 can be an AND relationship; the association relationship between equivalent sub-expression 2 and basic sub-expression 3 or equivalent sub-expression 3 can be an AND relationship.

若待构建子表达式1与待构建子表达式2的关联关系为并列关系,待构建子表达式1和待构建子表达式2分别与待构建子表达式3的关联关系为AND关系,则依据待构建子表达式1与待构建子表达式2的关联关系为并列关系,控制待构建子表达式1和待构建子表达式2各自独立构建查询方法,再依据待构建子表达式1和待构建子表达式2分别与待构建子表达式3的关联关系为AND关系,构建待构建子表达式3筛选待构建子表达式1和待构建子表达式2的查询结果的查询方案。If the association relationship between sub-expression 1 to be constructed and sub-expression 2 to be constructed is a parallel relationship, and the association relationship between sub-expression 1 to be constructed and sub-expression 2 to be constructed and sub-expression 3 to be constructed is an AND relationship respectively, then based on the association relationship between sub-expression 1 to be constructed and sub-expression 2 to be constructed being a parallel relationship, control sub-expression 1 to be constructed and sub-expression 2 to be constructed to construct query methods independently, and then based on the association relationship between sub-expression 1 to be constructed and sub-expression 2 to be constructed and sub-expression 3 to be constructed being an AND relationship respectively, construct a query plan for sub-expression 3 to be constructed to filter the query results of sub-expression 1 to be constructed and sub-expression 2 to be constructed.

在一些实施例中,考虑到子表达式可以构建多个具有相同查询功能的数据查询计划,为了从多个数据查询计划中筛选查询成本小的数据查询计划,通过待构建子表达式和关联关系构建数据查询计划,包括:In some embodiments, considering that a sub-expression can construct multiple data query plans with the same query function, in order to filter a data query plan with a low query cost from multiple data query plans, a data query plan is constructed by using the sub-expression to be constructed and the association relationship, including:

从数据查询表达式中获取待构建子表达式关联的当前查询优化请求;Obtain the current query optimization request associated with the sub-expression to be constructed from the data query expression;

通过待构建子表达式构建初始子查询计划;Construct an initial subquery plan through the sub-expression to be constructed;

根据待构建子表达式关联的当前查询优化请求和关联关系,从初始子查询计划中确定待处理子查询计划;Determine the subquery plan to be processed from the initial subquery plan according to the current query optimization request and the association relationship associated with the subexpression to be constructed;

对待处理子查询计划进行合并处理,得到数据查询计划。The sub-query plans to be processed are merged to obtain a data query plan.

其中,当前查询优化请求为通过数据查询表达式中与待构建子表达式关联的查询条件构成的查询优化请求,查询条件可以是数据查询表达式中与待构建子表达式关联的数据分布、索引类型等,该当前查询优化请求可以用于优化待构建子表达式构建的数据查询计划。Among them, the current query optimization request is a query optimization request constructed by the query conditions associated with the sub-expression to be constructed in the data query expression. The query conditions can be the data distribution, index type, etc. associated with the sub-expression to be constructed in the data query expression. The current query optimization request can be used to optimize the data query plan constructed by the sub-expression to be constructed.

比如,当前查询优化请求可以是用于快速获取待查询数据的请求,还可以是用于减少查询成本的请求,还可以是在满足一定范围内的查询速度和查询成本的请求,等等。For example, the current query optimization request may be a request for quickly acquiring data to be queried, a request for reducing query costs, a request for satisfying query speed and query costs within a certain range, and so on.

初始子查询计划可以为待构建子表达式构建的多个未优化的子查询计划。比如,多个未优化的子查询计划包括查询成本高的子查询计划,还包括查询成本低的子查询计划,还包括查询速度慢的子查询计划,还包括查询速度快的子查询计划,等等。The initial subquery plan may be multiple unoptimized subquery plans constructed for the sub-expression to be constructed. For example, the multiple unoptimized subquery plans include subquery plans with high query costs, subquery plans with low query costs, subquery plans with slow query speeds, subquery plans with fast query speeds, and so on.

待处理子查询计划可以为依据关联关系,从初始子查询计划中获取的满足当前查询优化请求的初始子查询计划。The subquery plan to be processed may be an initial subquery plan obtained from the initial subquery plan based on the association relationship and satisfying the current query optimization request.

比如,基于数据查询表达式确定出待构建子表达式1、待构建子表达式2和待构建子表达式3,待构建子表达式1用于查询表T1,待构建子表达式2用于查询表T2中第b列数据,待构建子表达式3用于从待构建子表达式1和待构建子表达式2中获取的数据中筛选数据,数据查询表达式还包括按照表T1的第a列排序。待构建子表达式1和待构建子表达式2之间的关联关系为基础子表达式之间的并列关系,待构建子表达式1和待构建子表达式2分别与待构建子表达式3之间的关联关系为AND关系。 For example, based on the data query expression, sub-expression 1 to be constructed, sub-expression 2 to be constructed and sub-expression 3 to be constructed are determined, sub-expression 1 to be constructed is used to query table T1, sub-expression 2 to be constructed is used to query the data in column b of table T2, sub-expression 3 to be constructed is used to filter data from the data obtained from sub-expression 1 to be constructed and sub-expression 2 to be constructed, and the data query expression also includes sorting according to column a of table T1. The association relationship between sub-expression 1 to be constructed and sub-expression 2 to be constructed is a parallel relationship between the basic sub-expressions, and the association relationship between sub-expression 1 to be constructed and sub-expression 2 to be constructed and sub-expression 3 to be constructed is an AND relationship.

待构建子表达式1关联的当前查询优化请求可以用于使待构建子表达式1快速获取查询表T1,并使从表T1中查询的数据按照表的第a列排序。待构建子表达式2关联的当前查询优化请求可以用于使待构建子表达式2快速获取查询表T2中的第b列数据。待构建子表达式3关联的当前查询优化请求可以用于合并待构建子表达式3从所有数据库节点中收集的排序(按照表T1的第a列排序)后的筛选数据。The current query optimization request associated with sub-expression 1 to be constructed can be used to enable sub-expression 1 to be constructed to quickly obtain query table T1, and to sort the data queried from table T1 according to the ath column of the table. The current query optimization request associated with sub-expression 2 to be constructed can be used to enable sub-expression 2 to be constructed to quickly obtain the bth column data in query table T2. The current query optimization request associated with sub-expression 3 to be constructed can be used to merge the sorted (sorted according to the ath column of table T1) filtered data collected by sub-expression 3 to be constructed from all database nodes.

待构建子表达式1构建的初始子查询计划可以包括Get(T1)、Scan(T1)、Scan(T1)+Sort(T1.a),待构建子表达式2构建的初始子查询计划可以包括Scan(T2)+Redistribute(T2.b)、Get(T2.b)。The initial subquery plan constructed by the sub-expression 1 to be constructed may include Get(T1), Scan(T1), Scan(T1)+Sort(T1.a), and the initial subquery plan constructed by the sub-expression 2 to be constructed may include Scan(T2)+Redistribute(T2.b), Get(T2.b).

根据待构建子表达式关联的当前查询优化请求和关联关系,从初始子查询计划中确定待处理子查询计划可以是:According to the current query optimization request and the association relationship associated with the sub-expression to be constructed, the sub-query plan to be processed can be determined from the initial sub-query plan:

因并列关系,可以独立筛选待构建子表达式1和待构建子表达式2构建的初始子查询计划,通过待构建子表达式1关联的当前查询优化请求,可以从初始子查询计划Get(T1)、Scan(T1)、Scan(T1)+Sort(T1.a)中筛选出Scan(T1),并通过待构建子表达式2关联的当前查询优化请求,可以从初始子查询计划Scan(T2)+Redistribute(T2.b)、Get(T2.b)中筛选出Scan(T2)+Redistribute(T2.b)。Due to the parallel relationship, the initial subquery plans constructed by subexpression 1 to be constructed and subexpression 2 to be constructed can be independently filtered. Through the current query optimization request associated with subexpression 1 to be constructed, Scan(T1) can be filtered out from the initial subquery plan Get(T1), Scan(T1), Scan(T1)+Sort(T1.a), and through the current query optimization request associated with subexpression 2 to be constructed, Scan(T2)+Redistribute(T2.b) can be filtered out from the initial subquery plan Scan(T2)+Redistribute(T2.b), Get(T2.b).

根据待构建子表达式1和待构建子表达式2分别与待构建子表达式3之间的关联关系为AND关系,因待构建子表达式1未使表T1中的数据按照表的第a列排序,因此,待构建子表达式3构建的初始子计划可以是使Inner Hash Join(T1.a=T1.b)筛选后的数据按照表T1中的第a列排序,以实现数据查询表达式中的数据分布。According to the AND relationship between sub-expression 1 to be constructed and sub-expression 2 to be constructed and sub-expression 3 to be constructed, since sub-expression 1 to be constructed does not sort the data in table T1 according to the a-th column of the table, the initial sub-plan for constructing sub-expression 3 to be constructed can be to sort the data after filtering by Inner Hash Join (T1.a=T1.b) according to the a-th column of table T1, so as to realize the data distribution in the data query expression.

即待构建子表达式3构建的初始子查询计划可以是Inner Hash Join(T1.a=T1.b)+Sort(T1.a)+GatherMerge(T1.a)、Inner Hash Join(T1.a=T1.b)+Gather(T1.a)+Sort(T1.a),等等,其中,GatherMerge操作符用于从数据库所有分段收集排好序的数据到主节点,Gather用于从所有分段收集数据到主节点,通过待构建子表达式3关联的当前查询优化请求,可以从Inner Hash Join(T1.a=T1.b)+Sort(T1.a)+GatherMerge(T1.a)、Inner Hash Join(T1.a=T1.b)+Gather(T1.a)+Sort(T1.a)等中筛选出一个作为待处理子查询计划。That is, the initial subquery plan constructed by the subexpression 3 to be constructed can be Inner Hash Join(T1.a=T1.b)+Sort(T1.a)+GatherMerge(T1.a), Inner Hash Join(T1.a=T1.b)+Gather(T1.a)+Sort(T1.a), and so on, where the GatherMerge operator is used to collect sorted data from all segments of the database to the master node, and Gather is used to collect data from all segments to the master node. Through the current query optimization request associated with the subexpression 3 to be constructed, one can be selected from Inner Hash Join(T1.a=T1.b)+Sort(T1.a)+GatherMerge(T1.a), Inner Hash Join(T1.a=T1.b)+Gather(T1.a)+Sort(T1.a), etc. as the subquery plan to be processed.

将待构建子表达式1构建的待处理子查询计划Scan(T1)、待构建子表达式2构建的待处理子查询计划Scan(T2)+Redistribute(T2.b)、待构建子表达式3构建的待处理子查询计划Inner Hash Join(T1.a=T1.b)+Sort(T1.a)+GatherMerge(T1.a)合并,从而得到数据查询计划。Merge the pending subquery plan Scan(T1) constructed by the pending subexpression 1, the pending subquery plan Scan(T2)+Redistribute(T2.b) constructed by the pending subexpression 2, and the pending subquery plan Inner Hash Join(T1.a=T1.b)+Sort(T1.a)+GatherMerge(T1.a) constructed by the pending subexpression 3 to obtain the data query plan.

在一些实施例中,为了使表达式可以构建数据查询计划,通过待构建子表达式构建初始子查询计划,包括:In some embodiments, in order to enable the expression to construct a data query plan, an initial sub-query plan is constructed by using the sub-expression to be constructed, including:

获取待构建子表达式关联的查询运算符;Get the query operator associated with the sub-expression to be constructed;

根据待构建子表达式,构建查询运算符之间的连接关系;According to the sub-expressions to be constructed, the connection relationship between the query operators is constructed;

控制查询运算符按照连接关系进行排布,得到初始子查询计划。The query operators are controlled to be arranged according to the connection relationship to obtain the initial subquery plan.

其中,查询运算符可以支持从数据库中获取所需的数据。比如,查询运算符可以是Get、Scan、Sort、GatherMerge、Redistribute、Replicate(复制),等等。 The query operator may support obtaining required data from the database, for example, the query operator may be Get, Scan, Sort, GatherMerge, Redistribute, Replicate, and the like.

连接关系用于限制查询运算符执行的顺序。比如,待构建子表达式2用于获取查询表T2中的第b列数据,即待构建子表达式2关联的多个查询运算符(Get、Scan和Redistribute),待构建子表达式2构建的查询运算符之间的连接关系可以是先Scan再Redistribute,或是只有Get,如此,使查询运算符按照连接关系进行排布,可以得到初始子查询计划Scan(T2)+Redistribute(T2.b)、Get(T2.b),或Get(T2.b)。The connection relationship is used to restrict the order in which query operators are executed. For example, sub-expression 2 to be constructed is used to obtain the data in column b of query table T2, that is, multiple query operators (Get, Scan, and Redistribute) associated with sub-expression 2 to be constructed, and the connection relationship between the query operators constructed by sub-expression 2 to be constructed can be Scan first and then Redistribute, or only Get. In this way, the query operators are arranged according to the connection relationship, and the initial sub-query plan Scan(T2)+Redistribute(T2.b), Get(T2.b), or Get(T2.b) can be obtained.

在一些实施例中,为了计算表达式构建的数据查询计划的查询成本,根据待构建子表达式关联的当前查询优化请求和关联关系,从初始子查询计划中确定待处理子查询计划,包括:In some embodiments, in order to calculate the query cost of the data query plan constructed by the expression, according to the current query optimization request and the association relationship associated with the sub-expression to be constructed, the sub-query plan to be processed is determined from the initial sub-query plan, including:

根据当前查询优化请求,获取待构建子表达式关联的待查询数据的元数据信息;According to the current query optimization request, metadata information of the to-be-queried data associated with the to-be-constructed sub-expression is obtained;

通过待查询数据的元数据信息,分析初始子查询计划执行时所需的计算量;Analyze the amount of computation required when executing the initial subquery plan based on the metadata information of the data to be queried;

根据计算量和关联关系,从初始子查询计划中确定待处理子查询计划。According to the computation amount and the association relationship, the subquery plan to be processed is determined from the initial subquery plan.

其中,元数据信息为描述待查询数据的数据信息。比如,元数据信息记录了关于数据的各种属性和特征,如数据类型、数据来源、数据存储方式、数据访问权限等,其中,数据的各种属性包括描述表的属性和特征,如表名、列名、列数据类型、列约束条件、索引信息、数据规模等。The metadata information is data information describing the data to be queried. For example, the metadata information records various attributes and characteristics of the data, such as data type, data source, data storage method, data access rights, etc. The various attributes of the data include the attributes and characteristics describing the table, such as table name, column name, column data type, column constraints, index information, data size, etc.

计算量为在执行初始子查询计划的所需计算操作和资源消耗,可以理解为查询成本。比如,计算量可以包括扫描数据文件的规模或时间、读取索引的规模或时间,等等。The computational amount is the computational operations and resource consumption required to execute the initial subquery plan, which can be understood as the query cost. For example, the computational amount may include the size or time of scanning data files, the size or time of reading indexes, and so on.

待构建子表达式1和待构建子表达式2之间的关联关系为子表达式之间的并列关系,待构建子表达式1和待构建子表达式2分别与待构建子表达式3之间的关联关系为AND关系,根据计算量和关联关系,从初始子查询计划中确定子查询计划可以是:The association relationship between sub-expression 1 to be constructed and sub-expression 2 to be constructed is a parallel relationship between sub-expressions, and the association relationship between sub-expression 1 to be constructed and sub-expression 2 to be constructed and sub-expression 3 to be constructed is an AND relationship. According to the calculation amount and the association relationship, the sub-query plan determined from the initial sub-query plan can be:

由于待构建子表达式1和待构建子表达式2之间的关联关系为并列关系,则可以直接依据计算量从待构建子表达式1和待构建子表达式2构建的初始子查询计划中筛选待处理子查询计划。Since the association relationship between sub-expression 1 to be constructed and sub-expression 2 to be constructed is a parallel relationship, the sub-query plan to be processed can be directly screened from the initial sub-query plan constructed by sub-expression 1 to be constructed and sub-expression 2 to be constructed based on the calculation amount.

由于待构建子表达式1和待构建子表达式2分别与待构建子表达式3之间的关联关系为AND关系,则在通过计算量筛选待构建子表达式3构建初始子查询计划时,还需要考虑待构建子表达式1构建的待处理子查询计划1和待构建子表达式2构建的待处理子查询计划2,避免待构建子表达式3构建的待处理子查询计划3与待处理子查询计划1和待处理子查询计划2存在重叠计划,或者避免待处理子查询计划1、待处理子查询计划2和待处理子查询计划3未完全执行所有查询条件。Since the association relationship between sub-expression 1 to be constructed and sub-expression 2 to be constructed and sub-expression 3 to be constructed is an AND relationship, when constructing the initial subquery plan by filtering sub-expression 3 to be constructed through calculation amount, it is also necessary to consider sub-query plan 1 to be processed constructed by sub-expression 1 to be constructed and sub-query plan 2 to be processed constructed by sub-expression 2 to be constructed, so as to avoid overlapping plans between sub-query plan 3 to be processed constructed by sub-expression 3 to be constructed and sub-query plan 1 to be processed and sub-query plan 2 to be processed, or to avoid sub-query plan 1 to be processed, sub-query plan 2 to be processed and sub-query plan 3 to be processed from failing to fully execute all query conditions.

在一些实施例中,为了使每个子表达式构建的数据查询计划不存在冲突或重复的逻辑,待构建子表达式包括第一表达式和第二表达式,第一表达式用于查询待查询数据,第二表达式用于筛选待查询数据,根据待构建子表达式关联的当前查询优化请求和关联关系,从初始子查询计划中确定待处理子查询计划,包括:In some embodiments, in order to ensure that the data query plan constructed by each sub-expression does not have conflicting or repeated logic, the sub-expression to be constructed includes a first expression and a second expression, the first expression is used to query the data to be queried, and the second expression is used to filter the data to be queried, and according to the current query optimization request and the association relationship associated with the sub-expression to be constructed, the sub-query plan to be processed is determined from the initial sub-query plan, including:

根据第一表达式关联的当前查询优化请求,从第一初始子查询计划中确定第一子查询计划,第一初始子查询计划为通过第一表达式构建的初始子查询计划;Determine a first subquery plan from a first initial subquery plan according to a current query optimization request associated with the first expression, where the first initial subquery plan is an initial subquery plan constructed by using the first expression;

基于第一子查询计划满足的当前查询优化请求,从第一表达式关联的当前查询优化请求中确定未满足请求; Based on the current query optimization request satisfied by the first subquery plan, determining unsatisfied requests from the current query optimization request associated with the first expression;

根据未满足请求和关联关系,从第二表达式关联的当前查询优化请求中确定最终优化请求,最终优化请求包括未满足请求;According to the unsatisfied requests and the association relationship, determine a final optimization request from the current query optimization requests associated with the second expression, where the final optimization request includes the unsatisfied request;

根据最终优化请求,从第二初始子查询计划中确定第二子查询计划,第二初始子查询计划为通过第二表达式构建的初始子查询计划,待处理子查询计划包括第一子查询计划和第二子查询计划。According to the final optimization request, a second subquery plan is determined from the second initial subquery plan, where the second initial subquery plan is an initial subquery plan constructed by using the second expression, and the subquery plans to be processed include the first subquery plan and the second subquery plan.

其中,第一表达式用于查询待查询数据。比如,第一表达式可以是上述待构建子表达式1和待构建子表达式2,待构建子表达式1用于查询表T1,待构建子表达式2用于查询表T2中第b列数据。The first expression is used to query the data to be queried. For example, the first expression can be the above-mentioned sub-expression 1 to be constructed and the sub-expression 2 to be constructed, the sub-expression 1 to be constructed is used to query table T1, and the sub-expression 2 to be constructed is used to query the data in column b of table T2.

第一表达式关联的当前查询优化请求用于筛选检索待查询数据时所需的初始子查询计划。The current query optimization request associated with the first expression is used to filter the initial subquery plan required for retrieving the data to be queried.

第一初始子查询计划为通过第一表达式构建的初始子查询计划,用于查询待查询数据,其中,第一表达式构建的第一初始子查询计划可以是一个或多个。The first initial sub-query plan is an initial sub-query plan constructed by the first expression, and is used to query the data to be queried, wherein the first initial sub-query plan constructed by the first expression may be one or more.

第一子查询计划为依据第一表达式关联的当前查询优化请求筛选的查询成本满足条件的第一初始子查询。The first sub-query plan is a first initial sub-query that satisfies a condition of a query cost selected according to a current query optimization request associated with a first expression.

第一子查询计划满足的当前查询优化请求为第一子查询计划在执行时可以满足的查询条件。The current query optimization request satisfied by the first sub-query plan is a query condition that can be satisfied when the first sub-query plan is executed.

未满足请求为第一表达式关联的当前查询优化请求中第一子查询计划未实现的查询优化请求。The unsatisfied request is a query optimization request that is not fulfilled by the first subquery plan in the current query optimization request associated with the first expression.

比如,第一表达式为待构建子表达式1时,由上述例子可知,待构建子表达式1关联的当前查询优化请求可以用于使待构建子表达式1快速获取查询表T1,并使从表T1中查询的数据按照表的第a列排序,若第一表达式依据第一表达式关联的当前查询优化请求构建的第一子查询计划为Scan(T1),则第一子查询计划满足的当前查询优化请求仅包括获取查询表T1,未满足请求为使从表T1中查询的数据按照表的第a列排序。For example, when the first expression is sub-expression 1 to be constructed, it can be seen from the above example that the current query optimization request associated with sub-expression 1 to be constructed can be used to enable sub-expression 1 to be constructed to quickly obtain query table T1, and sort the data queried from table T1 according to the a-th column of the table. If the first sub-query plan constructed by the first expression according to the current query optimization request associated with the first expression is Scan(T1), then the current query optimization request satisfied by the first sub-query plan only includes obtaining query table T1, and the unsatisfied request is to sort the data queried from table T1 according to the a-th column of the table.

最终优化请求为依据未满足请求所得到第二表达式关联的当前查询优化请求,以通过最终优化请求和第一子查询计划满足的查询优化请求可以实现数据查询表达式中的所有查询条件。The final optimization request is a current query optimization request associated with the second expression obtained based on the unsatisfied request, so that all query conditions in the data query expression can be realized through the query optimization request satisfied by the final optimization request and the first sub-query plan.

比如,由前述例子可知,第二表达式为待构建子表达式3时,待构建子表达式3关联的当前查询优化请求可以用于合并待构建子表达式3从所有数据库节点中收集的排序(按照表T1的第a列排序)后的筛选数据,由于待构建子表达式1构建的第一子查询计划仅可以获取查询表T1,并未使从表T1中查询的数据按照表的第a列排序(未满足请求),如此,第二表达式关联的当前查询优化请求需要包括未满足请求,以实现数据查询表达式中的查询条件。For example, as can be seen from the above example, when the second expression is the sub-expression 3 to be constructed, the current query optimization request associated with the sub-expression 3 to be constructed can be used to merge the sorted (sorted according to the ath column of table T1) filtered data collected by the sub-expression 3 to be constructed from all database nodes. Since the first sub-query plan constructed by the sub-expression 1 to be constructed can only obtain the query table T1, the data queried from table T1 is not sorted according to the ath column of the table (the request is not met), thus, the current query optimization request associated with the second expression needs to include the unsatisfied request to realize the query condition in the data query expression.

第二初始子查询计划为通过第二表达式构建的初始子查询计划。The second initial subquery plan is an initial subquery plan constructed by using the second expression.

第二子查询计划为依据最终优化请求筛选的查询成本满足条件的第二初始子查询。The second sub-query plan is a second initial sub-query that meets the condition based on the query cost selected by the final optimization request.

在一些实施例中,为了缩短构建数据查询计划的时间,本申请实施例提供的方法还包括: In some embodiments, in order to shorten the time of constructing a data query plan, the method provided in the embodiment of the present application further includes:

获取待构建子表达式关联的历史查询优化请求,以及与历史查询优化请求对应的历史查询计划;Obtain the historical query optimization request associated with the sub-expression to be constructed, and the historical query plan corresponding to the historical query optimization request;

从历史查询优化请求中确定当前查询优化请求对应的待处理历史请求;Determine, from the historical query optimization requests, a pending historical request corresponding to the current query optimization request;

若存在待处理历史请求,将待处理历史请求对应的历史查询计划作为待处理子查询计划;If there are pending historical requests, the historical query plan corresponding to the pending historical request is used as the pending sub-query plan;

若不存在待处理历史请求,触发执行通过待构建子表达式构建初始子查询计划的步骤。If there is no pending historical request, the step of constructing an initial subquery plan through the sub-expression to be constructed is triggered.

其中,历史查询优化请求为待构建子表达式在历史时间段所使用过的查询优化请求。The historical query optimization request is a query optimization request that has been used by the sub-expression to be constructed in a historical time period.

历史查询计划为待构建子表达式依据历史查询优化请求所构建的查询计划。The historical query plan is a query plan constructed by the sub-expression to be constructed according to the historical query optimization request.

待处理历史请求为与当前查询优化请求相同的历史查询优化请求。The pending historical request is a historical query optimization request that is the same as the current query optimization request.

比如,同一待构建子表达式在构建待处理子查询计划时,在不同查询时间段,可能面对多种不同的查询优化请求,若当前查询优化请求与历史查询优化请求相同时,即存在待处理历史请求,则将该待处理历史请求对应的历史查询计划作为待处理子查询计划,若不存在待处理历史请求,则直接通过待构建子表达式构建初始子查询计划,再通过当前查询优化请求筛选初始子查询计划,从而得到待处理子查询计划。For example, when the same sub-expression to be constructed is constructing a sub-query plan to be processed, it may face multiple different query optimization requests in different query time periods. If the current query optimization request is the same as the historical query optimization request, that is, there is a historical request to be processed, then the historical query plan corresponding to the historical request to be processed is used as the sub-query plan to be processed. If there is no historical request to be processed, the initial sub-query plan is directly constructed through the sub-expression to be constructed, and then the initial sub-query plan is filtered through the current query optimization request to obtain the sub-query plan to be processed.

在一些实施例中,为了便于获取待构建子表达式关联的历史查询优化请求,以及与历史查询优化请求对应的历史查询计划,包括:In some embodiments, in order to facilitate obtaining the historical query optimization request associated with the sub-expression to be constructed, and the historical query plan corresponding to the historical query optimization request, the following steps are included:

获取待构建子表达式关联的哈希表,哈希表包括历史查询优化请求,以及历史查询优化请求关联的第一标签和计划调取链接,第一标签为历史查询优化请求经过哈希函数计算后得到的哈希值;Obtain a hash table associated with the sub-expression to be constructed, the hash table including historical query optimization requests, and first tags and plan retrieval links associated with the historical query optimization requests, the first tag being a hash value obtained after the historical query optimization request is calculated by a hash function;

从历史查询优化请求中确定当前查询优化请求对应的待处理历史请求,包括:Determine the pending historical request corresponding to the current query optimization request from the historical query optimization requests, including:

确定第二标签,第二标签为当前查询优化请求经过哈希函数计算后得到的哈希值;Determine a second tag, where the second tag is a hash value obtained after the current query optimization request is calculated by a hash function;

根据第一标签和第二标签,从历史查询优化请求中确定当前查询优化请求对应的待处理历史请求;Determine, from the historical query optimization requests, a to-be-processed historical request corresponding to the current query optimization request according to the first tag and the second tag;

若存在待处理历史请求,将待处理历史请求对应的历史查询计划作为待处理子查询计划,包括:If there are pending historical requests, the historical query plan corresponding to the pending historical request is used as the pending sub-query plan, including:

若存在待处理历史请求,采用待处理历史请求关联的计划调取链接,调取历史查询优化请求对应的历史查询计划作为待处理子查询计划。If there are pending historical requests, the plan retrieval link associated with the pending historical request is used to retrieve the historical query plan corresponding to the historical query optimization request as the pending sub-query plan.

其中,哈希表可以使哈希值与历史查询优化请求映射,且历史查询优化请求携带有历史查询计划,以便快速从历史查询优化请求筛选与当前查询优化请求相同的待处理历史请求,如此,可以得到待处理历史请求所对应的历史查询计划。Among them, the hash table can map the hash value with the historical query optimization request, and the historical query optimization request carries the historical query plan, so as to quickly filter the pending historical requests that are the same as the current query optimization request from the historical query optimization request, so that the historical query plan corresponding to the pending historical request can be obtained.

第一标签为历史查询优化请求经过哈希函数计算后得到的哈希值。The first tag is the hash value obtained after the historical query optimization request is calculated by the hash function.

计划调取链接为调取待处理历史请求使用的查询优化计划的链接。The plan retrieval link is a link for retrieving the query optimization plan used by the pending historical request.

第二标签为当前查询优化请求经过相同哈希函数计算后得到的哈希值。The second tag is the hash value obtained after the current query optimization request is calculated using the same hash function.

比如,由于第一标签为历史查询优化请求经过哈希函数计算后得到的哈希值,第二标签为当前查询优化请求经过哈希函数计算后得到的哈希值,因此,便于通过第二标签,快速从第一标签中查找到与第二标签相同的哈希值的标签,从而快速将该标签的历史查询优化请求作为与当前查询优化请求相同的待处理历史请求,避免遍历历史查询优化请求。 For example, since the first label is the hash value obtained by calculating the historical query optimization request through the hash function, and the second label is the hash value obtained by calculating the current query optimization request through the hash function, it is convenient to quickly find the label with the same hash value as the second label from the first label through the second label, so as to quickly treat the historical query optimization request of the label as the same historical request to be processed as the current query optimization request, avoiding traversing the historical query optimization requests.

在一些实施例中,考虑到数据可能分布式存储在多个节点中,为了可以快速获取查询数据,在通过待构建子表达式和关联关系构建数据查询计划之后,还包括:In some embodiments, considering that data may be distributed and stored in multiple nodes, in order to quickly obtain query data, after constructing a data query plan through the sub-expressions to be constructed and the association relationships, the following is further included:

确定数据查询计划对应的查询指令,以及查询指令的副本指令;Determine the query instruction corresponding to the data query plan, and the copy instruction of the query instruction;

将副本指令发送至数据库的多个数据存储节点中,以使数据存储节点依据副本指令返回查询数据。The replica instruction is sent to multiple data storage nodes of the database, so that the data storage nodes return the query data according to the replica instruction.

其中,查询指令为数据查询计划的表现形式,以便数据库可以通过查询指令返回查询数据。副本指令为复制查询指令所得到的指令。The query instruction is a representation of a data query plan so that the database can return query data through the query instruction. The copy instruction is an instruction obtained by copying the query instruction.

数据存储节点为数据库存储数据的节点,每个数据存储节点依据接收到的副本指令返回数据。The data storage node is a node for storing data in the database, and each data storage node returns data according to the received replica instruction.

比如,有上述可知,待构建子表达式3用于从待构建子表达式1和待构建子表达式2中获取的数据中筛选数据,如此,在接收到每个数据存储节点依据副本指令返回的数据后,再依据待构建子表达式3筛选返回的数据。For example, as can be seen from the above, sub-expression 3 to be constructed is used to filter data from the data obtained from sub-expression 1 to be constructed and sub-expression 2 to be constructed. In this way, after receiving the data returned by each data storage node according to the replica instruction, the returned data is filtered according to sub-expression 3 to be constructed.

可以理解的是通过数据查询计划查询的数据可以应用于数据分析等领域,比如,广告内容查询、金融风控分析、信息推荐等领域。It is understandable that the data queried through the data query plan can be applied to fields such as data analysis, such as advertising content query, financial risk control analysis, information recommendation, etc.

在一些实施例中,针对广告内容查询时,通过数据查询计划可以快速查询广告,并在应用程序播放的内容中展示广告。In some embodiments, when querying for advertising content, advertisements can be quickly queried through a data query plan, and the advertisements can be displayed in the content played by the application.

比如,需要在一个应用程序播放的内容中展示特定类型的广告,考虑到相关技术提供的查询优化器采用的是多阶段查询优化,每个优化阶段都是独立优化的,导致每个阶段的优化策略可能存在冲突,导致查询过程中不确定性和错误率较高,如此,通过相关技术提供的查询优化器查询目标类型的广告时,其查询效率较低。For example, it is necessary to display a specific type of advertisement in the content played by an application. Considering that the query optimizer provided by the relevant technology adopts multi-stage query optimization, each optimization stage is optimized independently, which may lead to conflicts in the optimization strategies of each stage, resulting in high uncertainty and error rate in the query process. Therefore, when querying the target type of advertisements through the query optimizer provided by the relevant technology, its query efficiency is low.

为了可以提高查询特定类型的广告的效率,可以通过等效子表达式扩展数据查询表达式中的当前子表达式,并从等效子表达式和当前子表达式中筛选出查询效率高的待构建子表达式,并使多个待构建子表达式依据当前子表达式之间的关联关系构建数据查询计划,避免数据查询计划中存在冲突或重叠的步骤,进一步加快了特定类型的广告的查询效率,如此,可以快速将特定类型的广告展示在应用程序播放的内容中。In order to improve the efficiency of querying specific types of advertisements, the current sub-expression in the data query expression can be expanded through an equivalent sub-expression, and the sub-expressions to be constructed with high query efficiency can be filtered out from the equivalent sub-expressions and the current sub-expression, and multiple sub-expressions to be constructed can be used to construct a data query plan based on the association relationship between the current sub-expressions, thereby avoiding conflicting or overlapping steps in the data query plan, further speeding up the query efficiency of specific types of advertisements. In this way, specific types of advertisements can be quickly displayed in the content played by the application.

在一些实施例中,数据查询计划构建还可以应用于金融风控分析领域。In some embodiments, data query plan construction can also be applied to the field of financial risk control analysis.

比如,金融风控分析可以在用户分析金融产品时,通过本申请构建的数据查询计划可以快速查询金融产品有关的数据,并将该金融产品有关的数据推荐给用户。For example, when a user analyzes a financial product, financial risk control analysis can quickly query data related to the financial product through the data query plan constructed by this application, and recommend the data related to the financial product to the user.

在一些实施例中,数据查询计划构建还可以应用于信息推荐领域。In some embodiments, data query plan construction can also be applied to the field of information recommendation.

比如,用户在查询信息时,通过本申请构建的数据查询计划,可以快速查询信息,并将查询信息推荐给用户。For example, when a user searches for information, the data query plan constructed by this application can quickly search for information and recommend the searched information to the user.

由上可知,本申请实施例可以获取数据查询表达式,数据查询表达式包括关键词,同时数据查询表达式可以存在多个基础子表达式,每个基础子表达式之间存在关联关系,故可以根据关键词,得到数据查询表达式中的基础子表达式,以及基础子表达式之间的关联关系。然后对基础子表达式进行扩展,得到基础子表达式的等效子表达式,由于等效子表达式和基础子表达式具有等效的表达含义,如此,等效子表达式具有基础子表达式的关联关系,从基础子表达式和等效子表达式中可以筛选出待构建子表达式,待构建子表达式可 以是基础子表达式或等效子表达式。接着通过待构建子表达式和关联关系构建数据查询计划,多个待构建子表达式可以依据关联关系构建数据查询计划,使多个待构建子表达式在构建数据查询计划时不存在相互冲突,如此,便于构建数据查询计划,提升了查询性能。As can be seen from the above, the embodiment of the present application can obtain a data query expression, which includes keywords. At the same time, the data query expression can have multiple basic sub-expressions, and there is an association relationship between each basic sub-expression. Therefore, the basic sub-expressions in the data query expression and the association relationship between the basic sub-expressions can be obtained according to the keywords. Then the basic sub-expression is expanded to obtain the equivalent sub-expression of the basic sub-expression. Since the equivalent sub-expression and the basic sub-expression have equivalent expression meanings, the equivalent sub-expression has an association relationship with the basic sub-expression. The sub-expression to be constructed can be screened out from the basic sub-expression and the equivalent sub-expression. The sub-expression to be constructed can be It is a basic sub-expression or an equivalent sub-expression. Then, a data query plan is constructed through the sub-expression to be constructed and the associated relationship. Multiple sub-expressions to be constructed can construct a data query plan based on the associated relationship, so that multiple sub-expressions to be constructed do not conflict with each other when constructing a data query plan. In this way, it is easy to construct a data query plan and improve query performance.

根据上述实施例所描述的方法,以下将作进一步详细说明。The method described in the above embodiment will be further described in detail below.

在本实施例中,将以查询系统为例,对本申请实施例的方法进行详细说明。In this embodiment, the query system is taken as an example to describe the method of the embodiment of the present application in detail.

如图2a所示,一种查询系统,查询系统包括数据库管理系统和查询优化器,查询优化器位于数据库管理系统外,其中,查询优化器用于构建数据查询计划,数据库管理系统具体包括:As shown in FIG. 2a , a query system includes a database management system and a query optimizer. The query optimizer is located outside the database management system. The query optimizer is used to construct a data query plan. The database management system specifically includes:

解析器,用于对接收的数据查询命令进行语法分析,以得到数据查询命令的语法树;A parser, used for performing syntax analysis on the received data query command to obtain a syntax tree of the data query command;

查询转换器,用于接收解析器发送的语法树,并采用预设数据交换语言,对语法树进行格式转换处理,得到数据查询表达式,以使查询优化器依据数据查询表达式构建数据查询计划;A query converter is used to receive the syntax tree sent by the parser, and use a preset data exchange language to perform format conversion processing on the syntax tree to obtain a data query expression, so that the query optimizer can construct a data query plan based on the data query expression;

元数据提供器,用于在查询优化器构建数据查询计划时,接收查询优化器发送的元数据信息的获取请求;The metadata provider is used to receive a request for obtaining metadata information sent by the query optimizer when the query optimizer constructs a data query plan;

其中,元数据提供器是查询优化器与数据库管理系统之间进行元数据信息交换的接口。元数据提供组件是特定于数据库管理系统的插件,用于从数据库管理系统中检索元数据信息。The metadata provider is an interface for exchanging metadata information between the query optimizer and the database management system. The metadata provider component is a plug-in specific to the database management system and is used to retrieve metadata information from the database management system.

目录,用于接收元数据提供器发送的获取请求,并将获取请求通过元数据提供器发送至元数据交换器,以使获取请求通过元数据交换器可以从目录中获取元数据信息,并将元数据信息发送至解析器;The directory is used to receive the acquisition request sent by the metadata provider, and send the acquisition request to the metadata exchanger through the metadata provider, so that the acquisition request can obtain metadata information from the directory through the metadata exchanger, and send the metadata information to the parser;

其中,目录为数据库管理系统中保存元数据信息的数据库,元数据信息为描述数据库中数据的数据信息。The directory is a database in a database management system that stores metadata information, and the metadata information is data information that describes the data in the database.

元数据交换器:使用预设数据交换语言(EL语言)交换元数据信息。EL语言是一种XML格式的数据交换语言,它包含了数据库管理系统中的元数据信息,如表定义、操作符定义等,通过EL语言的获取请求,可以与不同的后端数据库管理系统进行交互,并获取所需要的元数据信息。同时也可以通过元数据交换器将元数据信息从数据库系统中提取出来并转换为EL语言,以实现查询优化器与数据库管理系统的解耦;Metadata exchanger: Use the preset data exchange language (EL language) to exchange metadata information. EL language is a data exchange language in XML format, which contains metadata information in the database management system, such as table definition, operator definition, etc. Through the acquisition request of EL language, it can interact with different back-end database management systems and obtain the required metadata information. At the same time, the metadata exchanger can also extract metadata information from the database system and convert it into EL language to achieve decoupling of the query optimizer and the database management system;

预设数据交换语言(EL语言):指的是一个用于交换查询优化器和数据库管理系统之间信息的框架。将查询优化器从数据库管理系统中分离出来需要构建一种处理查询的通信机制。该框架使用基于XML的语言编码通信所需的必要信息,例如输入查询、输出计划和元数据。在数据交换语言框架上使用的是一个简单的通信协议,用于获取元数据信息和发送数据查询计划;EL language: refers to a framework for exchanging information between the query optimizer and the database management system. Separating the query optimizer from the database management system requires building a communication mechanism for processing queries. The framework uses an XML-based language to encode the necessary information required for communication, such as input queries, output plans, and metadata. A simple communication protocol is used on the data exchange language framework to obtain metadata information and send data query plans;

计划转换器,用于将数据查询计划转换为查询指令;A plan converter, used to convert a data query plan into a query instruction;

执行器,用于接收计划转换器发送的查询指令,并依据查询指令返回查询数据。The executor is used to receive the query instructions sent by the plan converter and return the query data according to the query instructions.

其中,执行器执行:将最终查询指令的副本(副本指令)分派到每个分段(数据存储节点)中。在分布式查询执行期间,每个分段上的分配执行器同时充当数据的发送方和接 收方。例如运行在段S上的Redistribute(T2.b)实例基于T2.b的哈希值向其他段发送S上的元组,并且还从其他段上的其他Redistribute(T2.b)实例接收元组。The executor performs: dispatching a copy of the final query instruction (replica instruction) to each segment (data storage node). During distributed query execution, the assigned executor on each segment acts as both the sender and receiver of data. Receiver. For example, the Redistribute(T2.b) instance running on segment S sends tuples on S to other segments based on the hash value of T2.b, and also receives tuples from other Redistribute(T2.b) instances on other segments.

其中,Redistribute(T2.b)指示表T2的第b列。Here, Redistribute(T2.b) indicates the b-th column of table T2.

图2a中查询优化器的输入是数据查询表达式(EL查询)。查询优化器的输出是数据查询计划(EL计划)。在查询优化器构建数据查询计划时,可以查询数据库管理系统以获取元数据信息(例如表定义)。查询优化器通过允许数据库管理系统注册元数据提供程序来抽象元数据访问详情,负责将元数据信息序列化为EL查询后发送到查询优化器。还可以从包含以EL语言格式序列化的元数据对象的文件中查询元数据信息。数据库管理系统需要包括以EL格式消费/发出数据的转换器。查询转换器将语法树转换为数据查询表达式,而计划转换器将数据查询计划转换为可执行计划。这种计划转换器的实现完全在优化器之外完成,这允许多个数据库管理系统通过提供适当的计划转换器来使用查询优化器。The input of the query optimizer in Figure 2a is a data query expression (EL query). The output of the query optimizer is a data query plan (EL plan). When the query optimizer builds a data query plan, it can query the database management system to obtain metadata information (such as table definitions). The query optimizer abstracts metadata access details by allowing the database management system to register a metadata provider, which is responsible for serializing the metadata information into an EL query and sending it to the query optimizer. Metadata information can also be queried from a file containing metadata objects serialized in the EL language format. The database management system needs to include a converter that consumes/emit data in EL format. The query converter converts the syntax tree into a data query expression, while the plan converter converts the data query plan into an executable plan. The implementation of this plan converter is completed entirely outside the optimizer, which allows multiple database management systems to use the query optimizer by providing appropriate plan converters.

通过查询优化器位于数据库管理系统外,查询优化器的体系结构可以高度可扩展,查询优化器和数据库管理系统中的所有组件都可以单独替换并分别配置。By locating the query optimizer outside the database management system, the query optimizer architecture can be highly extensible, and all components in the query optimizer and the database management system can be replaced and configured separately.

如图2b所示,查询优化器具体包括:As shown in Figure 2b, the query optimizer specifically includes:

备忘录,用于将查询优化器构建的数据查询计划以一种紧凑的内存数据结构编码,并存储在备忘录中;Memo, used to encode the data query plan constructed by the query optimizer in a compact memory data structure and store it in the memo;

其中,备忘录的结构由称为表达式组的一组容器组成,其中每个表达式组包含逻辑上等效的表达式。备忘录表达式组捕获数据查询表达式的不同子查询表达式(例如表上的过滤器或两个表的表连接),表达式组的组成员(基础子表达式和基础子表达式的等效子表达式)以不同的逻辑方式实现同一数据查询(例如不同的连接顺序)。每个表达式组内的基础子表达式或基础子表达式的等效子表达式通过查询运算符构成,备忘录的递归结构允许紧凑编码庞大的数据,以通过构建数据查询计划的空间。The structure of the memo consists of a set of containers called expression groups, where each expression group contains logically equivalent expressions. The memo expression group captures different subquery expressions of a data query expression (such as a filter on a table or a table join of two tables), and the group members of the expression group (base subexpressions and equivalent subexpressions of the base subexpressions) implement the same data query in different logical ways (such as different join orders). The base subexpressions or equivalent subexpressions of the base subexpressions within each expression group are constructed through query operators. The recursive structure of the memo allows compact encoding of huge data to build a data query plan space.

搜索器,用于使用搜索机制,通过表达式组中的待构建子表达式构建初始子查询计划,并从初始子查询计划中估算出查询成本最小的待处理子查询计划;A searcher, used for using a search mechanism to construct an initial subquery plan through the sub-expressions to be constructed in the expression group, and estimating a subquery plan to be processed with the minimum query cost from the initial subquery plan;

任务调度器,用于启动搜索机制,通过任务调度器创建并行的工作单元以执行查询优化的三个主要步骤:探索(生成等效的逻辑表达式),实现(构建初始子查询计划)和优化(优化初始子查询计划),其中,强制执行所需的物理属性(例如排序顺序);A task scheduler is used to start the search mechanism, through which parallel work units are created to perform the three main steps of query optimization: exploration (generating equivalent logical expressions), realization (building initial subquery plans), and optimization (optimizing initial subquery plans), where the required physical properties (such as sort order) are enforced;

优化工具,用于为搜索器提供搜索机制,优化工具包括:Optimization tools are used to provide search mechanisms for search engines. The optimization tools include:

变换组件,用于确定等效子表达式,等效子表达式和基础子表达式具有等效的表达含义;A transformation component, used for determining an equivalent sub-expression, wherein the equivalent sub-expression and the basic sub-expression have equivalent expression meanings;

比如,通过应用转换规则生成计划备选方案,这些规则可以产生等效的逻辑表达式(例如Inner Join(T1,T2)→Inner Join(T2,T1))或现有子表达式的物理实现(例如Join(T1,T2)→Hash Join(T2,T1))。应用变换规则的结果被复制到备忘录中,这可能会导致创建新的表达式组和或向现有表达式组添加新的表达式。每个变换规则都是一个独立的组件,可以在系统配置中明确激活以及停用。For example, plan alternatives are generated by applying transformation rules, which can produce equivalent logical expressions (e.g. Inner Join(T1, T2) → Inner Join(T2, T1)) or physical realizations of existing sub-expressions (e.g. Join(T1, T2) → Hash Join(T2, T1)). The results of applying transformation rules are copied to the memo, which may result in the creation of new expression groups and/or the addition of new expressions to existing expression groups. Each transformation rule is an independent component that can be explicitly activated and deactivated in the system configuration.

其中,连接(Join)用于把两个或多个表中的行按照特定条件进行组合,产生一个更大的、包含了源表中所有信息的新表。哈希连接(Hash Join)是将Join的两个表通过哈希 (Hash)函数分别映射到若干个桶中,然后对每个桶进行Join操作,最后将Join的结果合并起来。Among them, Join is used to combine the rows in two or more tables according to specific conditions to generate a larger new table that contains all the information in the source table. Hash Join is to join the two tables through hash. The (Hash) function is mapped to several buckets respectively, and then a Join operation is performed on each bucket, and finally the Join results are merged.

强制执行属性,用于将强制执行所需的物理属性插入至初始子查询计划中;Enforcement attributes, used to insert the physical attributes required for enforcement into the initial subquery plan;

可以理解为,查询优化器包括一个可扩展的框架,用于基于正式属性规范描述查询要求和计划特性。属性具有不同的类型,包括逻辑属性(例如输出列)、物理属性(例如排序顺序和数据分布)和标量属性(例如输入条件中使用的列)。在查询优化器构建数据查询计划时,每个查询运算符可能会从通过待构建子表达式请求特定属性。优化的初始子查询计划(子查询计划)可能会自动满足待构建子表达式关联的当前查询优化请求(所需的属性(例如索引扫描(IndexScan)计划提供排序数据)),但是很多时候需要通过强制执行属性将一些属性插入到初始子查询计划中(例如初始子查询计划中包含Sort排序操作,需要将Sort操作符插入到计划中)。框架允许每个查询运算符根据子查询计划的属性和查询运算符的本地行为控制强制执行器的设置,作用是确保查询数据满足特定的要求,比如输出列的顺序、数据的分布等。It can be understood that the query optimizer includes an extensible framework for describing query requirements and plan characteristics based on formal attribute specifications. Attributes are of different types, including logical attributes (such as output columns), physical attributes (such as sort order and data distribution), and scalar attributes (such as columns used in input conditions). When the query optimizer builds a data query plan, each query operator may request specific attributes from the sub-expression to be built. The optimized initial sub-query plan (sub-query plan) may automatically meet the current query optimization request associated with the sub-expression to be built (the required attributes (such as the index scan (IndexScan) plan provides sorted data)), but many times it is necessary to insert some attributes into the initial sub-query plan through enforcement attributes (for example, the initial sub-query plan contains a Sort sorting operation, and the Sort operator needs to be inserted into the plan). The framework allows each query operator to control the settings of the enforcer based on the attributes of the sub-query plan and the local behavior of the query operator, so as to ensure that the query data meets specific requirements, such as the order of output columns, the distribution of data, etc.

元数据缓存,用于缓存待查询数据的元数据信息;Metadata cache, used to cache metadata information of data to be queried;

查询成本估计,用于通过元数据缓存中的元数据信息,对搜索器构建的初始子查询计划进行查询成本估计,得到初始子查询计划的查询成本。The query cost estimation is used to estimate the query cost of the initial sub-query plan constructed by the searcher through the metadata information in the metadata cache, and obtain the query cost of the initial sub-query plan.

比如,由于元数据(例如表定义)很少更改,每次查询都发送元数据信息会带来很大的开销,因此查询系统在查询优化器侧缓存元数据信息,只有在缓存中找不到元数据信息或发生了变更时才会从目录中查询元数据信息。For example, since metadata (such as table definitions) rarely changes, sending metadata information for each query will incur a lot of overhead. Therefore, the query system caches metadata information on the query optimizer side and only queries metadata information from the directory when the metadata information is not found in the cache or has changed.

例如,将用户输入的查询语言转换为结构化查询语言(Structured Query Language,SQL)(即数据查询命令),其中,SQL是一种用于处理关系型数据库的标准计算机语言,数据查询命令可以是(SELECT T1.a FROM T1,T2 WHERE T1.a=T2.b ORDER BY T1.a),其中,表T1的分布为按照表T1中第a列的哈希值进行分组,可以表示为Hashed(T1.a)中,表T2的分布为按照表T1中第a列的哈希值进行分组,可以表示为Hashed(T2.a)中,通过以下示例说明查询优化器构建数据查询计划:For example, the query language input by the user is converted into Structured Query Language (SQL) (i.e., data query command), where SQL is a standard computer language for processing relational databases. The data query command may be (SELECT T1.a FROM T1, T2 WHERE T1.a = T2.b ORDER BY T1.a), where the distribution of table T1 is grouped according to the hash value of the a-th column in table T1, which can be expressed as Hashed(T1.a), and the distribution of table T2 is grouped according to the hash value of the a-th column in table T1, which can be expressed as Hashed(T2.a). The following example illustrates how the query optimizer constructs a data query plan:

(一)、采用预设数据交换语言,将数据查询命令转换为EL格式的数据查询表达式。(i) Using the preset data exchange language, the data query command is converted into a data query expression in EL format.

比如,将数据查询命令转换为EL格式,其中以XML格式包含了所需的输出列、排序列、数据分布和查询逻辑。元数据信息(例如表和操作符定义)被装饰为元数据标识(Identifier,ID),以允许在查询优化器构建数据查询计划时,请求更多信息。元数据ID是由数据库系统标识符、对象标识符和版本号组成的唯一标识符。EL格式的数据查询表达式被发送到查询优化器中,解析并转换为内存中的逻辑表达式树,然后被复制到备忘录中。For example, the data query command is converted into EL format, which contains the required output columns, sorting columns, data distribution, and query logic in XML format. Metadata information (such as table and operator definitions) is decorated as a metadata identifier (ID) to allow the query optimizer to request more information when building a data query plan. The metadata ID is a unique identifier consisting of a database system identifier, an object identifier, and a version number. The data query expression in EL format is sent to the query optimizer, parsed and converted into an in-memory logical expression tree, and then copied to the memo.

(二)、根据关键词,得到数据查询表达式中的基础子表达式,以及基础子表达式之间的关联关系。(ii) According to the keywords, basic sub-expressions in the data query expression and the association relationship between the basic sub-expressions are obtained.

比如,如图2c所示,通过数据查询命令(SELECT T1.a FROM T1,T2 WHERE T1.a=T2.b ORDER BY T1.a),可以得到数据查询表达式中的关键词检索表T1、检索表T2和内连接(T1.a=T2.b)(Inner Join(T1.a=T2.b)),通过检索表T1、检索表T2和Inner Join(T1.a=T2.b)创建三个基础子表达式。 For example, as shown in Figure 2c, through the data query command (SELECT T1.a FROM T1,T2 WHERE T1.a=T2.b ORDER BY T1.a), the keyword retrieval table T1, retrieval table T2 and inner join (T1.a=T2.b)) in the data query expression can be obtained, and three basic sub-expressions are created through the retrieval table T1, retrieval table T2 and Inner Join (T1.a=T2.b).

出于简洁考虑省略了连接条件,表达式组0(Group0)称为根组,因为它对应于逻辑计划的根,逻辑计划中运算符之间的依赖关系被捕获为表达式组之间的引用,即Inner Join对应的基础子表达式属于根组,表T1和表T2对应的基础子表达式属于子组。例如Inner Join[T1,T2]是指表达式组1(Group1)和表达式组2(Group2)作为子组并连接。查询优化按以下步骤进行:For the sake of brevity, the join conditions are omitted. Expression group 0 (Group0) is called the root group because it corresponds to the root of the logical plan. The dependencies between operators in the logical plan are captured as references between expression groups, that is, the basic sub-expressions corresponding to Inner Join belong to the root group, and the basic sub-expressions corresponding to tables T1 and T2 belong to sub-groups. For example, Inner Join [T1, T2] means that expression group 1 (Group1) and expression group 2 (Group2) are connected as sub-groups. Query optimization is performed in the following steps:

(1)探索:触发产生其他等效逻辑计划表达式的转换规则(即确定等效子表达式,等效子表达式和对应的基础子表达式具有等效的表达含义)。(1) Exploration: triggering the transformation rules that generate other equivalent logical plan expressions (i.e., determining equivalent sub-expressions, which have equivalent expression meanings to the corresponding basic sub-expressions).

例如,将Inner Join[T1,T2]生成为Inner Join[T2,T1]。并将新生成的等效表达式添加到现有根组中,并可能创建新的表达式组。备忘录结构具有基于表达式拓扑的内置重复检测机制,用于检测并消除由不同转换创建的任何重复表达式。For example, Inner Join[T1,T2] is generated as Inner Join[T2,T1]. The newly generated equivalent expression is added to the existing root group and may create a new expression group. The memo structure has a built-in duplicate detection mechanism based on the expression topology to detect and eliminate any duplicate expressions created by different transformations.

(2)派生统计:在探索结束时备忘录维护给定查询的完整逻辑空间。然后查询优化器的派生统计机制被触发,以计算备忘录组的统计对象。统计对象主要是用于计算基数和数据倾斜度的列直方图(即待查询数据的元数据信息)。派生统计是在紧凑的备忘录结构上进行的,以避免扩展搜索空间。系统选择具有最高可靠统计的表达式为待构建子表达式组计算派生统计数据,统计计算是基于待构建子表达式的。(2) Derived Statistics: At the end of the exploration, the memo maintains the complete logical space of a given query. The derived statistics mechanism of the query optimizer is then triggered to calculate the statistical objects of the memo group. The statistical objects are mainly used to calculate the column histograms of cardinality and data skewness (i.e., metadata information of the data to be queried). Derived statistics are performed on a compact memo structure to avoid expanding the search space. The system selects the expression with the highest reliable statistics to calculate the derived statistical data for the group of sub-expressions to be constructed, and the statistical calculation is based on the sub-expressions to be constructed.

例如,具有少数连接条件(第一条件数量)的Inner Join表达式(基础子表达式)比具有更多连接条件(第二条件数量)的另一个等效Inner Join表达式(等效子表达式)的查询成本小(这种情况可能出现在生成多个连接顺序时)。因为连接条件的数量越多,传播和放大误差可能越高。由于需要在给定待构建子表达式的所有节点上聚合置信度分数,因此计算基数的置信度得分非常具有挑战。在根组中选择查询成本最小的待构建子表达式后,系统递归地在待构建子表达式的子组上触发派生统计。最后通过将子组的统计对象组合,构造为根组的统计对象(待查询数据的元数据信息)。For example, an Inner Join expression (base subexpression) with a few join conditions (first number of conditions) has a smaller query cost than another equivalent Inner Join expression (equivalent subexpression) with more join conditions (second number of conditions) (this may occur when generating multiple join orders). Because the greater the number of join conditions, the higher the propagation and amplification errors may be. Calculating the confidence score of the cardinality is very challenging because the confidence scores need to be aggregated on all nodes of a given subexpression to be constructed. After selecting the subexpression to be constructed with the smallest query cost in the root group, the system recursively triggers derived statistics on the subgroups of the subexpression to be constructed. Finally, by combining the statistical objects of the subgroups, the statistical object of the root group (metadata information of the data to be queried) is constructed.

如2d所示,展示了运行示例的派生统计机制。首先进行自上而下的遍历,其中根组的子表达式从其子组的子表达式(Group1和Group2)请求统计数据(表T1中第a列(T1.a)的元数据信息和表T2中第b列(T2.b)的元数据信息)。例如Inner Join(T1.a=T2.b)请求T1.a和T2.b的直方图。所请求的直方图通过元数据提供器从目录中按需加载、解析为EL查询(数据查询表达式)并存储在元数据缓存中,以服务于将来的请求。接下来进行自下而上的遍历,将子统计对象组合成根组的统计对象(待查询数据的元数据信息)。因为连接条件可能会影响列直方图,这会对T1.a和T2.b的(可能经过修改的)直方图进行组合。构造的统计对象附加到单个组中,可以在优化期间增量更新(例如通过添加新的直方图)。这对于保持派生统计的成本可管理至关重要。As shown in Figure 2d, the derived statistics mechanism of the running example is shown. First, a top-down traversal is performed, where the sub-expression of the root group requests statistics (metadata information of column a in table T1 (T1.a) and metadata information of column b in table T2 (T2.b)) from the sub-expressions of its sub-groups (Group1 and Group2). For example, Inner Join(T1.a=T2.b) requests the histograms of T1.a and T2.b. The requested histograms are loaded on demand from the catalog through the metadata provider, parsed into EL queries (data query expressions) and stored in the metadata cache to serve future requests. Next, a bottom-up traversal is performed to combine the sub-statistics objects into the statistics object of the root group (metadata information of the data to be queried). Because the join condition may affect the column histogram, this combines the (possibly modified) histograms of T1.a and T2.b. The constructed statistics objects are attached to a single group and can be incrementally updated during optimization (for example, by adding new histograms). This is crucial to keep the cost of derived statistics manageable.

(3)实现:触发创建逻辑表达式的物理实现的转换规则。例如,触发扫描规则,将逻辑(Get)生成为物理表扫描(Scan)。(3) Implementation: Triggering transformation rules that create the physical implementation of a logical expression. For example, triggering a scan rule to generate a logical (Get) into a physical table scan (Scan).

(4)优化:在此步骤中执行属性强制和查询成本估算方案。优化始于向备忘录的根表达式组提交查询优化请求,指定查询要求如结果分布和排序顺序。向根表达式组提交请求相当于请求根表达式组中物理运算符中满足请求的最小查询成本计划。 (4) Optimization: In this step, attribute enforcement and query cost estimation schemes are performed. Optimization begins by submitting a query optimization request to the root expression group of the memo, specifying query requirements such as result distribution and sort order. Submitting a request to the root expression group is equivalent to requesting the minimum query cost plan that satisfies the request among the physical operators in the root expression group.

对于每个查询优化请求,根表达式组中的子表达式根据传入的查询优化请求和运算符的本地要求,传递相应的请求给子组中的子表达式。在优化期间同一组可能会提交多个相同的查询优化请求。系统将查询优化请求缓存到哈希表中。只有在哈希表中不存在查询优化请求时才计算传入请求(以通过查询优化请求构建子查询计划)。此外每个子表达式都会维护本地哈希表,将查询优化请求映射到相应的历史查询请求。本地哈希表提供了从备忘录中提取历史查询计划时使用的计划调取链接。For each query optimization request, the sub-expressions in the root expression group pass the corresponding request to the sub-expressions in the sub-group according to the incoming query optimization request and the local requirements of the operator. Multiple identical query optimization requests may be submitted by the same group during optimization. The system caches query optimization requests in a hash table. Incoming requests are only calculated if there is no query optimization request in the hash table (to build a sub-query plan through the query optimization request). In addition, each sub-expression maintains a local hash table to map query optimization requests to corresponding historical query requests. The local hash table provides a plan call link used when extracting historical query plans from the memo.

如图2e所示,展示了运行示例中的查询优化请求。第一表达式关联的当前查询优化请求指定查询结果需要基于表T1的第a列的顺序被收集到主节点上。图2e还包括了最佳组表达式所对应的组哈希表,以及插入备忘录中的排序(Sort)、重新分配(Redistribute)、复制(Replicate)操作符。Gather操作符从所有数据存储节点收集数据到主节点。GatherMerge操作符从所有数据存储节点收集排好序的数据到主节点,并保持排序顺序。Redistribute操作符根据给定参数的哈希值将元组分布到分段中。As shown in Figure 2e, the query optimization request in the running example is shown. The current query optimization request associated with the first expression specifies that the query results need to be collected on the master node based on the order of column a of table T1. Figure 2e also includes the group hash table corresponding to the best group expression, as well as the Sort, Redistribute, and Replicate operators inserted in the memo. The Gather operator collects data from all data storage nodes to the master node. The GatherMerge operator collects sorted data from all data storage nodes to the master node and maintains the sorted order. The Redistribute operator distributes tuples to segments based on the hash value of the given parameter.

图2e包括Group0中的查询运算符(嵌套循环连接[Group1,Group2](Inner NestLoopJoin[Group1,Group2])、Inner NestLoopJoin[Group2,Group1]、Inner Hash Join[Group1,Group2]和Inner Hash Join[Group2,Group1、(Sort(T1.a)、Gather、GatherMerge(T1.a)),Group1中的查询运算符(Scan(T1)、Sort(T1.a)、Replicate),Group2中的查询运算符(Scan(T2)、Replicate、Redistribute(T2.b)),按照查询优化请求,通过Group1、Group2和Group0中查询运算符的构建右侧的数据查询计划。如图2f所示,展示了Inner Hash Join[T1,T2]的优化。对于这个查询优化请求,其中一种备选方案是根据连接条件对子节点分布进行对齐,这样要连接的数据就可以被共同定位。这是通过从第一表达式1请求Hashed(T1.a)分布和从第一表达式2请求Hashed(T2.b)分布来实现的。两个第一表达式都要求提供任何类型的排序顺序。在找到子Group(Group1和Group2)最佳计划之后(第一子查询计划),Inner Hash Join组合子属性以确定所提供的分布和排序顺序(最终优化请求)。其中,第一表达式2的最佳计划需要在T2.b上哈希分布T2,因为T2最初是在T2.a上哈希分布的,而第一表达式1的最佳计划是一个简单扫描,因为T1已经是在T1.a上哈希分布的。当确定所提供的属性不能满足最初的要求时(未满足请求),必须强制执行未满足的属性(最终优化请求)。系统中的属性执行采用灵活的框架,允许每个操作符、根据子计划和操作符本地行为来定义执行所需属性的行为。例如,保持顺序的嵌套循环连接(Nested Loops Join,NL Join)操作符如果外部子节点已经提供了排序顺序,则可能不需要在连接之上执行排序顺序。FIG2e includes the query operators in Group0 (Nested Loop Join [Group1, Group2] (Inner Nest Loop Join [Group1, Group2]), Inner Nest Loop Join [Group2, Group1], Inner Hash Join [Group1, Group2] and Inner Hash Join [Group2, Group1, (Sort (T1.a), Gather, GatherMerge (T1.a)), the query operators in Group1 (Scan (T1), Sort (T1.a), Replicate), the query operators in Group2 (Scan (T2), Replicate, Redistribute (T2.b)), and according to the query optimization request, the data query plan on the right is constructed by the query operators in Group1, Group2 and Group0. As shown in FIG2f, the optimization of Inner Hash Join [T1, T2] is shown. For this query optimization request, one of the alternatives is to align the distribution of child nodes according to the join condition, so that the nodes to be connected The data can be co-located. This is achieved by requesting a Hashed(T1.a) distribution from the first expression 1 and a Hashed(T2.b) distribution from the first expression 2. Both first expressions require any type of sort order to be provided. After finding the best plan for the subgroups (Group1 and Group2) (first subquery plan), the Inner Hash Join combines the sub-attributes to determine the provided distribution and sort order (final optimization request). Among them, the best plan for the first expression 2 requires hash distribution of T2 on T2.b, because T2 was originally hashed on T2.a. , while the best plan for the first expression 1 is a simple scan because T1 is already hash distributed on T1.a. When it is determined that the provided attributes do not satisfy the original requirements (unsatisfied request), the unsatisfied attributes must be enforced (final optimization request). The attribute enforcement in the system adopts a flexible framework that allows each operator, based on subplans and operator local behavior, to define the behavior of enforcing the required attributes. For example, the order-preserving Nested Loops Join (NL Join) operator may not need to enforce a sort order on the join if the outer child nodes already provide a sort order.

图2f展示了通过属性执行满足查询优化请求的两个可能Group0的初始子查询计划。左侧的数据查询计划在分段上对查询数据进行排序,然后在主节点上收集并合并排好序的结果。右侧的数据查询计划将查询数据从分段收集到主节点,然后对其进行排序。这些不同的备选方案编码在备忘录中,并由成本模型依据待查询数据的元数据信息计算初始子查询计划的查询成本(计算量)。最后基于查询优化器给出的查询结构,从备忘录中提取数据查询计划。图2f说明了正在运行的示例的计划提取。相关表达式都有对应的本地哈希表。每个本地哈希表将传入的优化请求映射到相应的子优化请求。首先在根表达式组的待构建 子表达式查找最佳组表达式为GatherMerge操作符。在GatherMerge的本地哈希表中,查找最佳的组表达式是Sort。因此将GatherMerge操作符链接到Sort。在Sort的本地哈希表中,对应的最佳组表达式是Inner Hash Join[T1,T2]。因此将Sort链接到Inner Hash Join。后面遵循相同的过程来完成计划提取,最终的数据查询计划,如图2f所示。提取的最佳数据查询计划以EL格式序列化并发送到数据库管理系统以供执行。Figure 2f shows two possible initial subquery plans of Group0 that satisfy the query optimization request through attribute execution. The data query plan on the left sorts the query data on the segment, and then collects and merges the sorted results on the main node. The data query plan on the right collects the query data from the segment to the main node and then sorts it. These different alternatives are encoded in the memo, and the cost model calculates the query cost (computational amount) of the initial subquery plan based on the metadata information of the data to be queried. Finally, based on the query structure given by the query optimizer, the data query plan is extracted from the memo. Figure 2f illustrates the plan extraction of the running example. The relevant expressions have corresponding local hash tables. Each local hash table maps the incoming optimization request to the corresponding sub-optimization request. First, in the root expression group to be constructed The sub-expression to find the best group expression is the GatherMerge operator. In the local hash table of GatherMerge, the best group expression to find is Sort. Therefore, the GatherMerge operator is linked to Sort. In the local hash table of Sort, the corresponding best group expression is Inner Hash Join [T1, T2]. Therefore, Sort is linked to Inner Hash Join. The same process is followed to complete the plan extraction, and the final data query plan is shown in Figure 2f. The extracted best data query plan is serialized in EL format and sent to the database management system for execution.

本申请主要面向数据分析等重要业务场景,例如大规模的实时数据和离线数据的查询系统。互联网每天产生海量的数据,因此给大数据查询系统带来了巨大的挑战,业务中的数据管理和查询系统已经在可扩展性、可用性和处理性能方面都取得了巨大的进展,使得数百万亿字节(TeraByte,TB)甚至千万亿字节(PetaByte,PB)的大数据集可以通过结构化查询语言(Structured Query Language,SQL)或类似SQL的接口进行分析和较快速的查询。互联网数据通常存储在分布式计算框架中(Hadoop)上,查询引擎将最初的SQL查询被编译成Spark或者MapReduce作业,其中,Spark为采用内存计算方式的分布式计算框架,MapReduce为采用磁盘读写IO方式的分布式计算框架。之前的查询优化器通常使用多阶段的查询优化,编译后的Spark或MapReduce作业性能较差,为此本申请设计了一种基于瀑布流树模型的查询计划构建方法,加速复杂查询的编码。经过实际数据测试相对之前的查询优化器可以带来数倍以上的查询加速。This application is mainly aimed at important business scenarios such as data analysis, such as large-scale real-time data and offline data query systems. The Internet generates a huge amount of data every day, which brings great challenges to the big data query system. The data management and query systems in the business have made great progress in scalability, availability and processing performance, so that large data sets of hundreds of trillions of bytes (TeraByte, TB) or even petabytes (PetaByte, PB) can be analyzed and queried more quickly through structured query language (Structured Query Language, SQL) or SQL-like interfaces. Internet data is usually stored in a distributed computing framework (Hadoop), and the query engine compiles the initial SQL query into Spark or MapReduce jobs, where Spark is a distributed computing framework that uses memory computing, and MapReduce is a distributed computing framework that uses disk read and write IO. Previous query optimizers usually use multi-stage query optimization, and the performance of compiled Spark or MapReduce jobs is poor. For this reason, this application designs a query plan construction method based on the waterfall tree model to accelerate the encoding of complex queries. After actual data testing, it can bring more than several times the query acceleration compared to previous query optimizers.

针对大数据系统上的SQL查询,业界一种常见的解决方法是使用Hive将查询转换为MapReduce任务,其中,Hive是一个基于Hadoop的数据仓库工具,Hadoop是一种分布式计算框架,但此方法可能导致交互分析性能不佳。为解决这个问题,业界已开发了多个专业查询引擎。但是这些方法往往只适用于特定的主机系统,并针对优化,而不能同时支持多种分布式系统的查询要求。由于查询优化器是分析查询处理性能的主要影响因素,而业务中有大量的大数据的复杂查询的处理需求,因此本申请设计了一种新型查询优化器的,用于大数据的分布式查询架构,业务可以基于此优化器快速开发新的优化技术和高级查询功能。本申请通过设计元数据缓存组件、备忘录、强制属性优化等技术来加速查询,并使用高效的多核调度程序来提高优化速度。For SQL queries on big data systems, a common solution in the industry is to use Hive to convert queries into MapReduce tasks, where Hive is a data warehouse tool based on Hadoop, and Hadoop is a distributed computing framework, but this method may lead to poor interactive analysis performance. To solve this problem, the industry has developed a number of professional query engines. However, these methods are often only applicable to specific host systems and are optimized, but cannot support the query requirements of multiple distributed systems at the same time. Since the query optimizer is the main influencing factor in analyzing query processing performance, and there are a large number of complex query processing requirements for big data in the business, this application designs a new type of query optimizer for a distributed query architecture for big data, and the business can quickly develop new optimization technologies and advanced query functions based on this optimizer. This application accelerates queries by designing metadata cache components, memos, mandatory attribute optimization and other technologies, and uses an efficient multi-core scheduler to increase the optimization speed.

本申请设计了一种新的查询优化器。基于瀑布流数模型优化框架的查询优化器。之前数据查询系统的查询优化器是与整个数据库管理系统紧密耦合在一起的,但是本申请的一个独有特点是设计的优化器可以作为单独的组件在数据库系统外运行。这种能力使得具有不同计算架构的产品可以共同使用同一个优化器成为可能。此外可以将优化器作为一个独立的产品部署并运行,并可以单独进行详细的测试和优化,而不必与数据库系统的结构耦合到一块。从而减少部署和测试的难度。The present application designs a new query optimizer. A query optimizer based on a waterfall model optimization framework. The query optimizer of the previous data query system was tightly coupled with the entire database management system, but a unique feature of the present application is that the designed optimizer can be run as a separate component outside the database system. This capability makes it possible for products with different computing architectures to use the same optimizer together. In addition, the optimizer can be deployed and run as an independent product, and can be tested and optimized separately in detail without being coupled to the structure of the database system. This reduces the difficulty of deployment and testing.

由上可知,本申请便于构建数据查询计划,提升查询效率。From the above, it can be seen that this application facilitates the construction of data query plans and improves query efficiency.

为了更好地实施以上方法,本申请实施例还提供一种查询计划构建装置,该查询计划构建装置具体可以集成在电子设备中,该电子设备可以为终端、服务器等设备。其中,终端可以为手机、平板电脑、智能蓝牙设备、笔记本电脑、个人电脑等设备;服务器可以是单一服务器,也可以是由多个服务器组成的服务器集群。 In order to better implement the above method, the embodiment of the present application also provides a query plan construction device, which can be integrated in an electronic device, and the electronic device can be a terminal, a server, etc. Among them, the terminal can be a mobile phone, a tablet computer, a smart Bluetooth device, a laptop, a personal computer, etc.; the server can be a single server or a server cluster composed of multiple servers.

比如,在本实施例中,将以查询计划构建装置具体集成在电子设备为例,对本申请实施例的方法进行详细说明。For example, in this embodiment, the method of the embodiment of the present application will be described in detail by taking the query plan construction device specifically integrated in an electronic device as an example.

例如,如图3所示,该查询计划构建装置可以包括获取单元310、关键单元320、等效单元330、确定单元340以及构建单元350,如下:For example, as shown in FIG. 3 , the query plan construction device may include an acquisition unit 310, a key unit 320, an equivalent unit 330, a determination unit 340, and a construction unit 350, as follows:

(一)、获取单元310。(i) Acquisition unit 310.

获取单元310,用于获取数据查询表达式,数据查询表达式包括关键词。The acquisition unit 310 is used to acquire a data query expression, where the data query expression includes keywords.

在一些实施例中,获取数据查询表达式,包括:In some embodiments, obtaining a data query expression includes:

获取数据查询命令,数据查询命令包括关键词;Get data query command, the data query command includes keywords;

根据关键词,对数据查询命令进行语法分析处理,得到数据查询命令的语法树;According to the keywords, the data query command is parsed and processed to obtain the syntax tree of the data query command;

采用预设数据交换语言,对语法树进行格式转换处理,得到数据查询表达式。The preset data exchange language is used to perform format conversion on the syntax tree to obtain a data query expression.

(二)、关键单元320。(ii) Key unit 320.

关键单元320,用于根据关键词,得到数据查询表达式中的基础子表达式,以及基础子表达式之间的关联关系。The key unit 320 is used to obtain the basic sub-expressions in the data query expression and the association relationship between the basic sub-expressions according to the keywords.

(三)、等效单元330。(iii) Equivalent unit 330.

等效单元330,用于对基础子表达式进行扩展,得到基础子表达式的等效子表达式,基础子表达式和基础子表达式的等效子表达式具有等效的表达含义。The equivalent unit 330 is used to expand the basic sub-expression to obtain an equivalent sub-expression of the basic sub-expression. The basic sub-expression and the equivalent sub-expression of the basic sub-expression have equivalent expression meanings.

(四)、确定单元340。(iv) determining unit 340.

确定单元340,用于从基础子表达式和基础子表达式的等效子表达式中确定待构建子表达式。The determining unit 340 is used to determine the sub-expression to be constructed from the basic sub-expression and the equivalent sub-expression of the basic sub-expression.

在一些实施例中,从基础子表达式和基础子表达式的等效子表达式中确定待构建子表达式,包括:In some embodiments, determining a sub-expression to be constructed from a basic sub-expression and an equivalent sub-expression of the basic sub-expression includes:

获取第一条件数量和第二条件数量,第一条件数量为基础子表达式关联的数据连接条件的条件数量,第二条件数量为基础子表达式的等效子表达式关联的数据连接条件的条件数量;Get a first condition number and a second condition number, the first condition number is the condition number of the data connection condition associated with the base sub-expression, and the second condition number is the condition number of the data connection condition associated with the equivalent sub-expression of the base sub-expression;

根据第一条件数量和第二条件数量,从基础子表达式和基础子表达式的等效子表达式中确定待构建子表达式。According to the first conditional quantity and the second conditional quantity, a sub-expression to be constructed is determined from a basic sub-expression and an equivalent sub-expression of the basic sub-expression.

在一些实施例中,从基础子表达式和基础子表达式的等效子表达式中确定待构建子表达式,包括:In some embodiments, determining a sub-expression to be constructed from a basic sub-expression and an equivalent sub-expression of the basic sub-expression includes:

获取基础子表达式关联的待查询数据的元数据信息;Obtain metadata information of the data to be queried associated with the basic sub-expression;

从元数据信息中获取待查询数据的数据量;Obtain the data volume of the data to be queried from the metadata information;

根据数据量,从基础子表达式和基础子表达式的等效子表达式中确定待构建子表达式。According to the amount of data, the sub-expression to be constructed is determined from the basic sub-expression and the equivalent sub-expression of the basic sub-expression.

(五)、构建单元350。(V) construct unit 350.

构建单元350,用于通过待构建子表达式和关联关系构建数据查询计划。The construction unit 350 is used to construct a data query plan through the sub-expressions to be constructed and the association relationships.

在一些实施例中,通过待构建子表达式和关联关系构建数据查询计划,包括:In some embodiments, constructing a data query plan using sub-expressions to be constructed and associated relationships includes:

从数据查询表达式中获取待构建子表达式关联的当前查询优化请求;Obtain the current query optimization request associated with the sub-expression to be constructed from the data query expression;

通过待构建子表达式构建初始子查询计划; Construct an initial subquery plan through the sub-expression to be constructed;

根据待构建子表达式关联的当前查询优化请求和关联关系,从初始子查询计划中确定待处理子查询计划;Determine the subquery plan to be processed from the initial subquery plan according to the current query optimization request and the association relationship associated with the subexpression to be constructed;

对待处理子查询计划进行合并处理,得到数据查询计划。The sub-query plans to be processed are merged to obtain a data query plan.

在一些实施例中,通过待构建子表达式构建初始子查询计划,包括:In some embodiments, constructing an initial subquery plan using the sub-expression to be constructed includes:

获取待构建子表达式关联的查询运算符;Get the query operator associated with the sub-expression to be constructed;

根据待构建子表达式,构建查询运算符之间的连接关系;According to the sub-expressions to be constructed, the connection relationship between the query operators is constructed;

控制查询运算符按照连接关系进行排布,得到初始子查询计划。The query operators are controlled to be arranged according to the connection relationship to obtain the initial subquery plan.

在一些实施例中,根据当前查询优化请求和关联关系,从初始子查询计划中确定待处理子查询计划,包括:In some embodiments, according to the current query optimization request and the association relationship, determining a to-be-processed sub-query plan from an initial sub-query plan includes:

根据待构建子表达式关联的当前查询优化请求,获取待构建子表达式关联的待查询数据的元数据信息;According to the current query optimization request associated with the sub-expression to be constructed, metadata information of the to-be-constructed data associated with the to-be-constructed sub-expression is obtained;

通过待查询数据的元数据信息,分析初始子查询计划执行时所需的计算量;Analyze the amount of computation required when executing the initial subquery plan based on the metadata information of the data to be queried;

根据计算量和关联关系,从初始子查询计划中确定待处理子查询计划。According to the computation amount and the association relationship, the subquery plan to be processed is determined from the initial subquery plan.

在一些实施例中,待构建子表达式包括第一表达式和第二表达式,第一表达式用于查询待查询数据,第二表达式用于筛选待查询数据,根据当前查询优化请求和关联关系,从初始子查询计划中确定待处理子查询计划,包括:In some embodiments, the sub-expression to be constructed includes a first expression and a second expression, the first expression is used to query the data to be queried, and the second expression is used to filter the data to be queried, and according to the current query optimization request and the association relationship, the sub-query plan to be processed is determined from the initial sub-query plan, including:

根据第一表达式关联的当前查询优化请求,从第一初始子查询计划中确定第一子查询计划,第一初始子查询计划为通过第一表达式构建的初始子查询计划;Determine a first subquery plan from a first initial subquery plan according to a current query optimization request associated with the first expression, where the first initial subquery plan is an initial subquery plan constructed by using the first expression;

基于第一子查询计划满足的当前查询优化请求,从第一表达式关联的当前查询优化请求中确定未满足请求;Based on the current query optimization request satisfied by the first subquery plan, determining unsatisfied requests from the current query optimization request associated with the first expression;

根据未满足请求和关联关系,从第二表达式关联的当前查询优化请求中确定最终优化请求,最终优化请求包括未满足请求;According to the unsatisfied requests and the association relationship, determine a final optimization request from the current query optimization requests associated with the second expression, where the final optimization request includes the unsatisfied request;

根据最终优化请求,从第二初始子查询计划中确定第二子查询计划,第二初始子查询计划为通过第二表达式构建的初始子查询计划,待处理子查询计划包括第一子查询计划和第二子查询计划。According to the final optimization request, a second subquery plan is determined from the second initial subquery plan, where the second initial subquery plan is an initial subquery plan constructed by using the second expression, and the subquery plans to be processed include the first subquery plan and the second subquery plan.

在一些实施例中,还包括:In some embodiments, it also includes:

获取待构建子表达式关联的历史查询优化请求,以及与历史查询优化请求对应的历史查询计划;Obtain the historical query optimization request associated with the sub-expression to be constructed, and the historical query plan corresponding to the historical query optimization request;

从历史查询优化请求中确定当前查询优化请求对应的待处理历史请求;Determine, from the historical query optimization requests, a pending historical request corresponding to the current query optimization request;

若存在待处理历史请求,将待处理历史请求对应的历史查询计划作为待处理子查询计划;If there are pending historical requests, the historical query plan corresponding to the pending historical request is used as the pending sub-query plan;

若不存在待处理历史请求,触发执行通过待构建子表达式构建初始子查询计划的步骤。If there is no pending historical request, the step of constructing an initial subquery plan through the sub-expression to be constructed is triggered.

在一些实施例中,获取待构建子表达式关联的历史查询优化请求,以及与历史查询优化请求对应的历史查询计划,包括:In some embodiments, obtaining a historical query optimization request associated with the sub-expression to be constructed and a historical query plan corresponding to the historical query optimization request includes:

获取待构建子表达式关联的哈希表,哈希表包括历史查询优化请求,以及历史查询优化请求关联的第一标签和计划调取链接,第一标签为历史查询优化请求经过哈希函数计算后得到的哈希值; Obtain a hash table associated with the sub-expression to be constructed, the hash table including historical query optimization requests, and first tags and plan retrieval links associated with the historical query optimization requests, the first tag being a hash value obtained after the historical query optimization request is calculated by a hash function;

从历史查询优化请求中确定当前查询优化请求对应的待处理历史请求,包括:Determine the pending historical request corresponding to the current query optimization request from the historical query optimization requests, including:

确定第二标签,第二标签为当前查询优化请求经过哈希函数计算后得到的哈希值;Determine a second tag, where the second tag is a hash value obtained after the current query optimization request is calculated by a hash function;

根据第一标签和第二标签,从历史查询优化请求中确定当前查询优化请求对应的待处理历史请求;Determine, from the historical query optimization requests, a to-be-processed historical request corresponding to the current query optimization request according to the first tag and the second tag;

若存在待处理历史请求,将待处理历史请求对应的历史查询计划作为待处理子查询计划,包括:If there are pending historical requests, the historical query plan corresponding to the pending historical request is used as the pending sub-query plan, including:

若存在待处理历史请求,采用待处理历史请求关联的计划调取链接,调取历史查询优化请求对应的历史查询计划作为待处理子查询计划。If there are pending historical requests, the plan retrieval link associated with the pending historical request is used to retrieve the historical query plan corresponding to the historical query optimization request as the pending sub-query plan.

在一些实施例中,在通过待构建子表达式构建数据查询计划之后,还包括:In some embodiments, after constructing a data query plan using the sub-expression to be constructed, the method further includes:

确定数据查询计划对应的查询指令,以及查询指令的副本指令;Determine the query instruction corresponding to the data query plan, and the copy instruction of the query instruction;

将副本指令发送至数据库的多个数据存储节点中,以使数据存储节点依据副本指令返回查询数据。The replica instruction is sent to multiple data storage nodes of the database, so that the data storage nodes return the query data according to the replica instruction.

具体实施时,以上各个单元可以作为独立的实体来实现,也可以进行任意组合,作为同一或若干个实体来实现,以上各个单元的具体实施可参见前面的方法实施例,在此不再赘述。In specific implementation, the above units can be implemented as independent entities, or can be arbitrarily combined to be implemented as the same or several entities. The specific implementation of the above units can refer to the previous method embodiments, which will not be repeated here.

由上可知,本实施例的查询计划构建装置由获取单元获取数据查询表达式,数据查询表达式包括关键词;由关键单元根据关键词,得到数据查询表达式中的当前子表达式,以及当前子表达式之间的关联关系;由等效单元对基础子表达式进行扩展,得到基础子表达式的等效子表达式,基础子表达式和基础子表达式的等效子表达式具有等效的表达含义;由确定单元从基础子表达式和基础子表达式的等效子表达式中确定待构建子表达式;由构建单元通过待构建子表达式和关联关系构建数据查询计划。由此,本申请实施例可以便于构建数据查询计划,提升查询效率。As can be seen from the above, the query plan construction device of this embodiment obtains a data query expression by an acquisition unit, and the data query expression includes keywords; the key unit obtains the current sub-expression in the data query expression and the association relationship between the current sub-expressions according to the keywords; the equivalent unit expands the basic sub-expression to obtain the equivalent sub-expression of the basic sub-expression, and the basic sub-expression and the equivalent sub-expression of the basic sub-expression have equivalent expression meanings; the determination unit determines the sub-expression to be constructed from the basic sub-expression and the equivalent sub-expression of the basic sub-expression; the construction unit constructs a data query plan through the sub-expression to be constructed and the association relationship. Therefore, the embodiment of the present application can facilitate the construction of a data query plan and improve query efficiency.

本申请实施例还提供一种电子设备,该电子设备可以为终端、服务器等设备。其中,终端可以为手机、平板电脑、智能蓝牙设备、笔记本电脑、个人电脑,等等;服务器可以是单一服务器,也可以是由多个服务器组成的服务器集群,等等。The embodiment of the present application also provides an electronic device, which can be a terminal, a server, etc. The terminal can be a mobile phone, a tablet computer, a smart Bluetooth device, a laptop, a personal computer, etc. The server can be a single server or a server cluster composed of multiple servers, etc.

在一些实施例中,该查询计划构建装置还可以集成在多个电子设备中,比如,查询计划构建装置可以集成在多个服务器中,由多个服务器来实现本申请的查询计划构建方法。In some embodiments, the query plan construction device can also be integrated into multiple electronic devices. For example, the query plan construction device can be integrated into multiple servers, and the query plan construction method of the present application can be implemented by multiple servers.

在本实施例中,将以本实施例的电子设备是服务器为例进行详细描述,比如,如图4所示,其示出了本申请实施例所涉及的服务器的结构示意图,具体来讲:In this embodiment, the electronic device of this embodiment is a server as an example for detailed description. For example, as shown in FIG. 4 , it shows a schematic diagram of the structure of the server involved in the embodiment of the present application. Specifically:

该服务器可以包括一个或者一个以上处理核心的处理器410、一个或一个以上计算机可读存储介质的存储器420、电源430、输入模块440以及通信模块450等部件。本领域技术人员可以理解,图4中示出的服务器结构并不构成对服务器的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。其中:The server may include one or more processing core processors 410, one or more computer-readable storage media memories 420, a power supply 430, an input module 440, and a communication module 450. Those skilled in the art will appreciate that the server structure shown in FIG. 4 does not limit the server, and may include more or fewer components than shown, or combine certain components, or arrange the components differently.

处理器410是该服务器的控制中心,利用各种接口和线路连接整个服务器的各个部分,通过运行或执行存储在存储器420内的软件程序和/或模块,以及调用存储在存储器420内的数据,执行服务器的各种功能和处理数据。在一些实施例中,处理器410可包括一个或多个处理核心;在一些实施例中,处理器410可集成应用处理器和调制解调处理器,其中, 应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器410中。The processor 410 is the control center of the server, and uses various interfaces and lines to connect various parts of the entire server. It executes various functions of the server and processes data by running or executing software programs and/or modules stored in the memory 420, and calling data stored in the memory 420. In some embodiments, the processor 410 may include one or more processing cores; in some embodiments, the processor 410 may integrate an application processor and a modem processor, wherein: The application processor mainly processes the operating system, user interface, and application programs, and the modem processor mainly processes wireless communications. It is understandable that the modem processor may not be integrated into the processor 410 .

存储器420可用于存储软件程序以及模块,处理器410通过运行存储在存储器420的软件程序以及模块,从而执行各种功能应用以及数据处理。存储器420可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据服务器的使用所创建的数据等。此外,存储器420可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地,存储器420还可以包括存储器控制器,以提供处理器410对存储器420的访问。The memory 420 can be used to store software programs and modules. The processor 410 executes various functional applications and data processing by running the software programs and modules stored in the memory 420. The memory 420 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application required for at least one function (such as a sound playback function, an image playback function, etc.), etc.; the data storage area may store data created according to the use of the server, etc. In addition, the memory 420 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one disk storage device, a flash memory device, or other volatile solid-state storage devices. Accordingly, the memory 420 may also include a memory controller to provide the processor 410 with access to the memory 420.

服务器还包括给各个部件供电的电源430,在一些实施例中,电源430可以通过电源管理系统与处理器410逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。电源430还可以包括一个或一个以上的直流或交流电源、再充电系统、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。The server also includes a power supply 430 for supplying power to various components. In some embodiments, the power supply 430 can be logically connected to the processor 410 through a power management system, so that the power management system can manage charging, discharging, and power consumption. The power supply 430 can also include any components such as one or more DC or AC power supplies, recharging systems, power failure detection circuits, power converters or inverters, and power status indicators.

该服务器还可包括输入模块440,该输入模块440可用于接收输入的数字或字符信息,以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。The server may further include an input module 440, which may be used to receive input digital or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function controls.

该服务器还可包括通信模块450,在一些实施例中通信模块450可以包括无线模块,服务器可以通过该通信模块450的无线模块进行短距离无线传输,从而为用户提供了无线的宽带互联网访问。比如,该通信模块450可以用于帮助用户收发电子邮件、浏览网页和访问流式媒体等。The server may further include a communication module 450. In some embodiments, the communication module 450 may include a wireless module. The server may perform short-range wireless transmission through the wireless module of the communication module 450, thereby providing wireless broadband Internet access to the user. For example, the communication module 450 may be used to help the user send and receive emails, browse web pages, and access streaming media.

尽管未示出,服务器还可以包括显示单元等,在此不再赘述。具体在本实施例中,服务器中的处理器410会按照如下的指令,将一个或一个以上的应用程序的进程对应的可执行文件加载到存储器420中,并由处理器410来运行存储在存储器420中的应用程序,从而实现前述实施例提供的任一方法。Although not shown, the server may further include a display unit, etc., which will not be described in detail herein. Specifically in this embodiment, the processor 410 in the server will load the executable files corresponding to the processes of one or more applications into the memory 420 according to the following instructions, and the processor 410 will run the application stored in the memory 420, thereby implementing any of the methods provided in the aforementioned embodiments.

以上各个方法的具体实施可参见前面的实施例,在此不再赘述。The specific implementation of the above methods can be found in the previous embodiments, which will not be described in detail here.

由上可知,多个待构建子表达式可以依据关联关系构建查询计划,使多个待构建子表达式在构建数据查询计划时不存在相互冲突,如此,便于构建数据查询计划,提升了查询性能。From the above, it can be seen that multiple sub-expressions to be constructed can construct a query plan based on the association relationship, so that there is no conflict between the multiple sub-expressions to be constructed when constructing the data query plan. This makes it easier to construct the data query plan and improves the query performance.

本领域普通技术人员可以理解,上述实施例的各种方法中的全部或部分步骤可以通过指令来完成,或通过指令控制相关的硬件来完成,该指令可以存储于一计算机可读存储介质中,并由处理器进行加载和执行。A person of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be completed by instructions, or by controlling related hardware through instructions. The instructions may be stored in a computer-readable storage medium and loaded and executed by a processor.

为此,本申请实施例提供一种计算机可读存储介质,其中存储有多条指令,该指令能够被处理器进行加载,以执行本申请实施例所提供的任一种查询计划构建方法中的步骤。例如,该指令可以执行前述实施例提供的任一方法。To this end, an embodiment of the present application provides a computer-readable storage medium, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps in any query plan construction method provided in the embodiment of the present application. For example, the instructions can execute any method provided in the aforementioned embodiment.

其中,该存储介质可以包括:只读存储器(ROM,Read Only Memory)、随机存取记忆体(RAM,Random Access Memory)、磁盘或光盘等。Among them, the storage medium may include: read-only memory (ROM), random access memory (RAM), disk or CD, etc.

根据本申请的一个方面,提供了一种计算机程序产品,该计算机程序产品包括计算机程序/指令,该计算机程序/指令存储在计算机可读存储介质中。计算机设备的处理器从计算 机可读存储介质读取该计算机程序/指令,处理器执行该计算机程序/指令,使得该计算机设备执行上述实施例中提供的查询计划构建方面的各种可选实现方式中提供的方法。According to one aspect of the present application, a computer program product is provided, the computer program product comprising a computer program/instructions stored in a computer readable storage medium. The machine-readable storage medium reads the computer program/instructions, and the processor executes the computer program/instructions, so that the computer device executes the methods provided in various optional implementations of the query plan construction provided in the above embodiments.

由于该存储介质中所存储的指令,可以执行本申请实施例所提供的任一种查询计划构建方法中的步骤,因此,可以实现本申请实施例所提供的任一种查询计划构建方法所能实现的有益效果,详见前面的实施例,在此不再赘述。Since the instructions stored in the storage medium can execute the steps in any query plan construction method provided in the embodiments of the present application, the beneficial effects that can be achieved by any query plan construction method provided in the embodiments of the present application can be achieved. Please refer to the previous embodiments for details and will not be repeated here.

以上对本申请实施例所提供的一种查询计划构建方法、装置、电子设备和存储介质进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。 The above is a detailed introduction to a query plan construction method, device, electronic device and storage medium provided in the embodiments of the present application. Specific examples are used in this article to illustrate the principles and implementation methods of the present application. The description of the above embodiments is only used to help understand the method of the present application and its core idea; at the same time, for technical personnel in this field, according to the ideas of the present application, there will be changes in the specific implementation methods and application scope. In summary, the content of this specification should not be understood as a limitation on the present application.

Claims (15)

一种查询计划构建方法,所述方法由电子设备执行,包括:A query plan construction method, the method being executed by an electronic device, comprising: 获取数据查询表达式,所述数据查询表达式包括关键词;Obtaining a data query expression, wherein the data query expression includes keywords; 根据所述关键词,得到所述数据查询表达式中的基础子表达式,以及所述基础子表达式之间的关联关系;According to the keywords, basic sub-expressions in the data query expression and the association relationship between the basic sub-expressions are obtained; 对所述基础子表达式进行扩展,得到所述基础子表达式的等效子表达式,所述基础子表达式和所述基础子表达式的等效子表达式具有等效的表达含义;Expanding the basic sub-expression to obtain an equivalent sub-expression of the basic sub-expression, wherein the basic sub-expression and the equivalent sub-expression of the basic sub-expression have equivalent expression meanings; 从所述基础子表达式和所述基础子表达式的等效子表达式中确定待构建子表达式;Determine a sub-expression to be constructed from the basic sub-expression and the equivalent sub-expression of the basic sub-expression; 通过所述待构建子表达式和所述关联关系构建数据查询计划。A data query plan is constructed using the sub-expressions to be constructed and the association relationships. 如权利要求1所述的查询计划构建方法,所述从所述基础子表达式和所述基础子表达式的等效子表达式中确定待构建子表达式,包括:The query plan construction method according to claim 1, wherein determining the sub-expression to be constructed from the basic sub-expression and the equivalent sub-expression of the basic sub-expression comprises: 获取第一条件数量和第二条件数量,所述第一条件数量为所述基础子表达式关联的数据连接条件的条件数量,所述第二条件数量为所述基础子表达式的等效子表达式关联的数据连接条件的条件数量;Acquire a first condition quantity and a second condition quantity, wherein the first condition quantity is the condition quantity of the data connection condition associated with the basic sub-expression, and the second condition quantity is the condition quantity of the data connection condition associated with the equivalent sub-expression of the basic sub-expression; 根据所述第一条件数量和所述第二条件数量,从所述基础子表达式和所述基础子表达式的等效子表达式中确定所述待构建子表达式。The sub-expression to be constructed is determined from the basic sub-expression and the equivalent sub-expression of the basic sub-expression according to the first conditional quantity and the second conditional quantity. 如权利要求1所述的查询计划构建方法,所述从所述基础子表达式和所述基础子表达式的等效子表达式中确定待构建子表达式,包括:The query plan construction method according to claim 1, wherein determining the sub-expression to be constructed from the basic sub-expression and the equivalent sub-expression of the basic sub-expression comprises: 获取所述基础子表达式关联的待查询数据的元数据信息;Obtaining metadata information of the to-be-queried data associated with the basic sub-expression; 从所述元数据信息中获取所述待查询数据的数据量;Acquire the data volume of the to-be-queried data from the metadata information; 根据所述数据量,从所述基础子表达式和所述基础子表达式的等效子表达式中确定待构建子表达式。According to the data amount, a sub-expression to be constructed is determined from the basic sub-expression and the equivalent sub-expression of the basic sub-expression. 如权利要求1所述的查询计划构建方法,所述通过所述待构建子表达式和所述关联关系构建数据查询计划,包括:The query plan construction method according to claim 1, wherein the step of constructing a data query plan using the sub-expressions to be constructed and the association relationship comprises: 从所述数据查询表达式中获取所述待构建子表达式关联的当前查询优化请求;Acquire a current query optimization request associated with the sub-expression to be constructed from the data query expression; 通过所述待构建子表达式构建初始子查询计划;Constructing an initial subquery plan through the sub-expression to be constructed; 根据所述当前查询优化请求和所述关联关系,从所述初始子查询计划中确定待处理子查询计划;Determining a to-be-processed sub-query plan from the initial sub-query plan according to the current query optimization request and the association relationship; 对所述待处理子查询计划进行合并处理,得到所述数据查询计划。The sub-query plans to be processed are merged to obtain the data query plan. 如权利要求4所述的查询计划构建方法,所述通过所述待构建子表达式构建初始子查询计划,包括:The query plan construction method according to claim 4, wherein the step of constructing an initial subquery plan using the subexpression to be constructed comprises: 获取所述待构建子表达式关联的查询运算符;Obtaining the query operator associated with the sub-expression to be constructed; 根据所述待构建子表达式,构建所述查询运算符之间的连接关系;Constructing a connection relationship between the query operators according to the sub-expressions to be constructed; 控制所述查询运算符按照所述连接关系进行排布,得到所述初始子查询计划。The query operators are controlled to be arranged according to the connection relationship to obtain the initial sub-query plan. 如权利要求4所述的查询计划构建方法,所述根据所述当前查询优化请求和所述关联关系,从所述初始子查询计划中确定待处理子查询计划,包括: The query plan construction method according to claim 4, wherein determining the sub-query plan to be processed from the initial sub-query plan according to the current query optimization request and the association relationship comprises: 根据所述当前查询优化请求,获取所述待构建子表达式关联的待查询数据的元数据信息;According to the current query optimization request, metadata information of the to-be-queried data associated with the to-be-constructed sub-expression is obtained; 通过所述待查询数据的元数据信息,分析所述初始子查询计划执行时所需的计算量;Analyzing the amount of computation required when executing the initial subquery plan based on the metadata information of the data to be queried; 根据所述计算量和所述关联关系,从所述初始子查询计划中确定所述待处理子查询计划。The to-be-processed sub-query plan is determined from the initial sub-query plan according to the calculation amount and the association relationship. 如权利要求4所述的查询计划构建方法,所述待构建子表达式包括第一表达式和第二表达式,所述第一表达式用于查询待查询数据,所述第二表达式用于筛选所述待查询数据,所述根据所述当前查询优化请求和所述关联关系,从所述初始子查询计划中确定待处理子查询计划,包括:The query plan construction method according to claim 4, wherein the sub-expression to be constructed includes a first expression and a second expression, the first expression is used to query the data to be queried, and the second expression is used to filter the data to be queried, and the determining the sub-query plan to be processed from the initial sub-query plan according to the current query optimization request and the association relationship comprises: 根据所述第一表达式关联的当前查询优化请求,从第一初始子查询计划中确定第一子查询计划,所述第一初始子查询计划为通过所述第一表达式构建的所述初始子查询计划;Determine, according to the current query optimization request associated with the first expression, a first sub-query plan from a first initial sub-query plan, wherein the first initial sub-query plan is the initial sub-query plan constructed by using the first expression; 基于所述第一子查询计划满足的当前查询优化请求,从所述第一表达式关联的当前查询优化请求中确定未满足请求;Based on the current query optimization request satisfied by the first subquery plan, determining unsatisfied requests from the current query optimization request associated with the first expression; 根据所述未满足请求和所述关联关系,从所述第二表达式关联的当前查询优化请求中确定最终优化请求,所述最终优化请求包括所述未满足请求;Determine, according to the unsatisfied request and the association relationship, a final optimization request from the current query optimization requests associated with the second expression, wherein the final optimization request includes the unsatisfied request; 根据所述最终优化请求,从第二初始子查询计划中确定第二子查询计划,所述第二初始子查询计划为通过所述第二表达式构建的所述初始子查询计划,所述待处理子查询计划包括所述第一子查询计划和所述第二子查询计划。According to the final optimization request, a second sub-query plan is determined from a second initial sub-query plan, wherein the second initial sub-query plan is the initial sub-query plan constructed by the second expression, and the sub-query plan to be processed includes the first sub-query plan and the second sub-query plan. 如权利要求4所述的查询计划构建方法,所述方法还包括:The query plan construction method according to claim 4, further comprising: 获取所述待构建子表达式关联的历史查询优化请求,以及与所述历史查询优化请求对应的历史查询计划;Obtaining a historical query optimization request associated with the sub-expression to be constructed, and a historical query plan corresponding to the historical query optimization request; 从所述历史查询优化请求中确定所述当前查询优化请求对应的待处理历史请求;Determine, from the historical query optimization requests, a to-be-processed historical request corresponding to the current query optimization request; 若存在所述待处理历史请求,将所述待处理历史请求对应的历史查询计划作为所述待处理子查询计划;If the pending historical request exists, the historical query plan corresponding to the pending historical request is used as the pending sub-query plan; 若不存在所述待处理历史请求,触发执行所述通过所述待构建子表达式构建所述初始子查询计划的步骤。If the to-be-processed historical request does not exist, the step of constructing the initial subquery plan through the to-be-constructed sub-expression is triggered to execute. 如权利要求8所述的查询计划构建方法,所述获取所述待构建子表达式关联的历史查询优化请求,以及与所述历史查询优化请求对应的历史查询计划,包括:The query plan construction method according to claim 8, wherein the step of obtaining the historical query optimization request associated with the sub-expression to be constructed and the historical query plan corresponding to the historical query optimization request comprises: 获取所述待构建子表达式关联的哈希表,所述哈希表包括历史查询优化请求,以及所述历史查询优化请求关联的第一标签和计划调取链接,所述第一标签为所述历史查询优化请求经过哈希函数计算后得到的哈希值;Obtaining a hash table associated with the sub-expression to be constructed, wherein the hash table includes a historical query optimization request, and a first label and a plan retrieval link associated with the historical query optimization request, wherein the first label is a hash value obtained after the historical query optimization request is calculated by a hash function; 所述从所述历史查询优化请求中确定所述当前查询优化请求对应的待处理历史请求,包括:The determining, from the historical query optimization requests, the to-be-processed historical request corresponding to the current query optimization request includes: 确定第二标签,所述第二标签为所述当前查询优化请求经过所述哈希函数计算后得到的哈希值;Determine a second tag, where the second tag is a hash value of the current query optimization request obtained after the hash function is used to calculate the current query optimization request; 根据所述第一标签和所述第二标签,从所述历史查询优化请求中确定所述当前查询优化请求对应的待处理历史请求; Determine, from the historical query optimization requests, a to-be-processed historical request corresponding to the current query optimization request according to the first tag and the second tag; 所述若存在所述待处理历史请求,将所述待处理历史请求对应的历史查询计划作为所述待处理子查询计划,包括:If the pending historical request exists, taking the historical query plan corresponding to the pending historical request as the pending sub-query plan includes: 若存在所述待处理历史请求,采用所述待处理历史请求关联的计划调取链接,调取所述历史查询优化请求对应的历史查询计划作为所述待处理子查询计划。If the pending historical request exists, the plan retrieval link associated with the pending historical request is used to retrieve the historical query plan corresponding to the historical query optimization request as the pending sub-query plan. 如权利要求1-9任一项所述的查询计划构建方法,所述获取数据查询表达式,包括:The query plan construction method according to any one of claims 1 to 9, wherein obtaining the data query expression comprises: 获取数据查询命令,所述数据查询命令包括所述关键词;Acquire a data query command, wherein the data query command includes the keyword; 根据所述关键词,对所述数据查询命令进行语法分析处理,得到所述数据查询命令的语法树;According to the keywords, the data query command is subjected to syntax analysis to obtain a syntax tree of the data query command; 采用预设数据交换语言,对所述语法树进行格式转换处理,得到所述数据查询表达式。The syntax tree is format converted using a preset data exchange language to obtain the data query expression. 如权利要求1-9任一项所述的查询计划构建方法,在所述通过所述待构建子表达式和所述关联关系构建数据查询计划之后,还包括:The query plan construction method according to any one of claims 1 to 9, after constructing the data query plan by using the sub-expression to be constructed and the association relationship, further comprising: 确定所述数据查询计划对应的查询指令,以及所述查询指令的副本指令;Determine a query instruction corresponding to the data query plan, and a copy instruction of the query instruction; 将所述副本指令发送至数据库的多个数据存储节点中,以使所述数据存储节点依据所述副本指令返回查询数据。The replica instruction is sent to multiple data storage nodes of the database, so that the data storage nodes return the query data according to the replica instruction. 一种查询计划构建装置,所述装置部署在电子设备上,包括:A query plan construction device, the device being deployed on an electronic device, comprising: 获取单元,用于获取数据查询表达式,所述数据查询表达式包括关键词;An acquisition unit, used to acquire a data query expression, wherein the data query expression includes a keyword; 关键单元,用于根据所述关键词,得到所述数据查询表达式中的基础子表达式,以及所述基础子表达式之间的关联关系;A key unit, used to obtain basic sub-expressions in the data query expression and the association relationship between the basic sub-expressions according to the keywords; 等效单元,用于对所述基础子表达式进行扩展,得到所述基础子表达式的等效子表达式,所述基础子表达式和所述基础子表达式的等效子表达式具有等效的表达含义;An equivalent unit, used for expanding the basic sub-expression to obtain an equivalent sub-expression of the basic sub-expression, wherein the basic sub-expression and the equivalent sub-expression of the basic sub-expression have equivalent expression meanings; 确定单元,用于从所述基础子表达式和所述基础子表达式的等效子表达式中确定待构建子表达式;A determination unit, configured to determine a sub-expression to be constructed from the basic sub-expression and an equivalent sub-expression of the basic sub-expression; 构建单元,用于通过所述待构建子表达式和所述关联关系构建数据查询计划。A construction unit is used to construct a data query plan through the sub-expression to be constructed and the association relationship. 一种电子设备,包括处理器和存储器,所述存储器存储有多条指令;所述处理器从所述存储器中加载指令,以执行如权利要求1~11任一项所述的查询计划构建方法中的步骤。An electronic device comprises a processor and a memory, wherein the memory stores a plurality of instructions; the processor loads instructions from the memory to execute the steps in the query plan construction method as described in any one of claims 1 to 11. 一种计算机可读存储介质,所述计算机可读存储介质存储有多条指令,所述指令适于处理器进行加载,以执行权利要求1~11任一项所述的查询计划构建方法中的步骤。A computer-readable storage medium stores a plurality of instructions, wherein the instructions are suitable for being loaded by a processor to execute the steps in the query plan construction method according to any one of claims 1 to 11. 一种计算机程序产品,包括计算机程序/指令,所述计算机程序/指令被处理器执行时实现权利要求1~11任一项所述的查询计划构建方法中的步骤。 A computer program product comprises a computer program/instruction, wherein when the computer program/instruction is executed by a processor, the steps in the query plan construction method according to any one of claims 1 to 11 are implemented.
PCT/CN2024/083246 2023-05-19 2024-03-22 Query plan construction method and apparatus, electronic device and storage medium WO2024239782A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202310575987.9A CN116975098A (en) 2023-05-19 2023-05-19 Query plan construction method, device, electronic equipment and storage medium
CN202310575987.9 2023-05-19

Publications (1)

Publication Number Publication Date
WO2024239782A1 true WO2024239782A1 (en) 2024-11-28

Family

ID=88475650

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2024/083246 WO2024239782A1 (en) 2023-05-19 2024-03-22 Query plan construction method and apparatus, electronic device and storage medium

Country Status (2)

Country Link
CN (1) CN116975098A (en)
WO (1) WO2024239782A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119597797A (en) * 2025-02-10 2025-03-11 北京科杰科技有限公司 Database query method based on tree structure conversion and HIVE multi-table union

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116975098A (en) * 2023-05-19 2023-10-31 深圳市腾讯计算机系统有限公司 Query plan construction method, device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5822747A (en) * 1996-08-23 1998-10-13 Tandem Computers, Inc. System and method for optimizing database queries
US6021405A (en) * 1996-08-23 2000-02-01 Tandem Computers, Inc. System and method for optimizing database queries with improved performance enhancements
CN101984439A (en) * 2010-12-09 2011-03-09 上海市共进通信技术有限公司 Method for realizing optimization of data source extensive makeup language (XML) query system based on sub-queries
CN112395303A (en) * 2019-08-15 2021-02-23 阿里巴巴集团控股有限公司 Query execution method and device, electronic equipment and computer readable medium
CN116975098A (en) * 2023-05-19 2023-10-31 深圳市腾讯计算机系统有限公司 Query plan construction method, device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5822747A (en) * 1996-08-23 1998-10-13 Tandem Computers, Inc. System and method for optimizing database queries
US6021405A (en) * 1996-08-23 2000-02-01 Tandem Computers, Inc. System and method for optimizing database queries with improved performance enhancements
CN101984439A (en) * 2010-12-09 2011-03-09 上海市共进通信技术有限公司 Method for realizing optimization of data source extensive makeup language (XML) query system based on sub-queries
CN112395303A (en) * 2019-08-15 2021-02-23 阿里巴巴集团控股有限公司 Query execution method and device, electronic equipment and computer readable medium
CN116975098A (en) * 2023-05-19 2023-10-31 深圳市腾讯计算机系统有限公司 Query plan construction method, device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119597797A (en) * 2025-02-10 2025-03-11 北京科杰科技有限公司 Database query method based on tree structure conversion and HIVE multi-table union

Also Published As

Publication number Publication date
CN116975098A (en) 2023-10-31

Similar Documents

Publication Publication Date Title
US9372890B2 (en) Methods, systems, and computer-readable media for providing a query layer for cloud databases
US10983967B2 (en) Creation of a cumulative schema based on an inferred schema and statistics
US8935232B2 (en) Query execution systems and methods
US9053210B2 (en) Graph query processing using plurality of engines
US7103590B1 (en) Method and system for pipelined database table functions
US20190026336A1 (en) Retrieving data from a data storage system
US10311055B2 (en) Global query hint specification
US20170083573A1 (en) Multi-query optimization
WO2024239782A1 (en) Query plan construction method and apparatus, electronic device and storage medium
CN110019314B (en) Dynamic data encapsulation method, client and server based on data item analysis
US11507555B2 (en) Multi-layered key-value storage
CN116361487A (en) A multi-source heterogeneous policy knowledge map construction and storage method and system
CN105630881A (en) Data storage method and query method for RDF (Resource Description Framework)
Zhang et al. Towards efficient join processing over large RDF graph using mapreduce
Ali et al. Storage, indexing, query processing, and benchmarking in centralized and distributed RDF engines: a survey
KR20180077830A (en) Processing method for a relational query in distributed stream processing engine based on shared-nothing architecture, recording medium and device for performing the method
US8832157B1 (en) System, method, and computer-readable medium that facilitates efficient processing of distinct counts on several columns in a parallel processing system
CN117149825A (en) Route search method, device, electronic equipment and storage medium
Kaoudi et al. Cloud-based RDF data management
Saleem Storage, indexing, query processing, and benchmarking in centralized and distributed RDF engines: a survey
US12265514B1 (en) Efficient sorting techniques facilitating the creation and use of dataset summary metadata
US12265535B1 (en) Dataset summary metadata providing improved query performance
Zhu et al. Hydb: Access optimization for data-intensive service
EP4546154A1 (en) Relational subtree matching for improved query performance
WO2025180404A1 (en) Data processing method, electronic device, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24810031

Country of ref document: EP

Kind code of ref document: A1

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载