这是indexloc提供的服务,不要输入任何密码
Skip to content

rfc (wip): parameterized queries #7473

@0x777

Description

@0x777

Parameterized Queries

Motivation

While graphql-engine's auto-generated API is reasonably expressive, it doesn't
really compete with the expressivity of a query language that a database
provides. You can express a lot more queries through SQL on Postgres, use a lot
more operators on Postgres than through graphql-engine's Postgres API.

We want to provide our users an ability to fallback to a databases's query
language when they run into cases where graphql-engine's API is not adequate.
Currently graphql-engine addresses this by exposing GraphQL API over
views
and
functions.
For example, a view can be defined as follows:

create view page_items as
  select title, content from articles
  union all
  select title, description from alerts

and graphql-engine provides this API:

query {
  page_items(where: {..} limit: .. offset: ..) {
    ...
  }
}

and similarly for the following function

CREATE FUNCTION search_articles(search text)
RETURNS SETOF articles AS $$
  SELECT *
  FROM articles
  WHERE
    title ilike ('%' || search || '%')
    OR content ilike ('%' || search || '%')
$$ LANGUAGE sql STABLE;

graphql-engine provides this API:

query {
  search_articles(args: {search: "something"} where: .. limit: .. offset: ..) {
    ...
  }
}

While the above approach works, it has a couple of shortcomings:

1. Requirement of DDL privileges

Our users need DDL privileges to create these new functions or views. This
privilege is not easily granted, particularly in an enterprise setup. The less
we require of a database, the easier it would be for our users to add
graphql-engine in their stack.

2. Restrictions on the function return type

The functions that graphql-engine can support currently need to return a
'table' or 'setof table' that's already tracked. There are couple of issues
here:

  1. Does not work for databases (such as SQL Server) that do not have
    the notion of functions returning named table types.

  2. Does not support functions that do not return a table. For example, if the
    above search_articles function only returns title and content,
    graphql-engine cannot track it.

    CREATE FUNCTION search_page_items(search text)
    RETURNS SETOF TABLE (title text, content text) AS $$
      SELECT title, content
      FROM articles
      WHERE
        title ilike ('%' || search || '%')
        OR content ilike ('%' || search || '%')
    $$ LANGUAGE sql STABLE;

    To work around this, users go through this elaborate process:

    1. Create a table with the schema of the function return type
    2. Track the table
    3. Change the definition of the function to return the newly created table
    4. Track the function

    which is terrible DX

To summarize the motivation is this:

We want our users to use the full query language ability of a database
to extend graphql-engine's auto-generated API while addressing the shortcomings
of the existing features (GraphQL API views and functions).

Parameterized queries

We want to provide a means to define views and functions at graphql-engine's
layer.

  1. A view's definition is a database select query so instead of creating a
    view, we can just accept the select query which backs the view. For example,
    we want to allow adding page_items root field as follows:

    root_field_name: page_items
    sql_query: |
      select title, content from articles
      union all
      select title, description from alerts

    This should create the same GraphQL API for page_items field as it did
    when we created the view page_items and tracked it:

    type query_root {
      ...
      page_items(distint_on: .. where: .. limit: .. offset: ..)
      page_items_aggregate(where: .. limit: .. offset: ..)
    }
    type subscription_root {
      ...
      page_items(distint_on: .. where: .. limit: .. offset: ..)
      page_items_aggregate(where: .. limit: .. offset: ..)
    }
  2. A function's body may or may not be a simple query, but let's only consider
    the cases where a function's body is a query such as search_articles. In
    case of search_articles, we want to allow search_articles root field
    as follows:

    root_field_name: search_articles
    arguments:
    - $1: search
      type: String
    sql_query: |
      SELECT *
      FROM articles
      WHERE
        title ilike ('%' || $1 || '%')
        OR content ilike ('%' || $1 || '%')

    (Note how the search parameter used in search_articles function's body
    is replaced by $1 in the equivalent parameterized query)

    This should create the same GraphQL API for search_articles field as it
    did when we created the search_articles function and tracked it.

    type query_root {
      ...
      search_articles(args: {search: ..} where: .. limit: .. offset: ..)
      search_articles_aggregate(args: {search: ..} where: .. limit: .. offset: ..)
    }
    type subscription_root {
      ...
      search_articles(args: {search: ..} where: .. limit: .. offset: ..)
      search_articles_aggregate(args: {search: ..} where: .. limit: .. offset: ..)
    }

In short, users can specify 'parameterized' database queries instead of
defining functions and views on the database to add custom root fields

Note that the restrictions on function types that are mentioned in the
motivation section can be addressed by extending our functions support. We can
let our users specify the name of the table that a function returns (currently
we get it from the function's type signature on the database) or a custom
schema when they don't return an existing table. However, addressing the 'no
ddl' requirement is more important and hence we are going forward with
parameterized queries.

Product requirements

  1. 'Parameterized queries' should be a superset of our current functions support:

    1. If the body of a function is a database query, one can use a parameterized
      query (like above).

    2. If the body of a function is procedural and is not a single database query,
      one can use a parameterized query as follows:

      select * from function($1, ... $n);
    3. Support for subscriptions and aggregate fields like we currently have for
      immutable functions

  2. Should allow mutations too - one should be able to mark a templated query as
    a query or a mutation so that the field can be added to query root or
    mutation root appropriately.

  3. Should be a superset of our current views support:

    1. Parameterized queries instead of defining views

      select * from articles where is_verified = true
    2. Parameterized queries to use existing views (verified_articles is a
      view on the database):

      select * from verified_articles;
    3. Support for subscriptions and aggregate fields like we currently have on views

  4. Support for relationships

    1. From a parameterized query's response:

      query {
        search_articles(args: {search: "something"}) {
          title
          content
          # author is a relationship from search_articles to author
          author {
            name
          }
        }
      }
    2. To a parametrized query:

      query {
        author {
          name
          # verified_articles is a relationship from author table to
          # 'verified_articles' paramaterized query
          verified_articles {
            title
            content
          }
        }
      }
  5. Should allow defining queries which do not return tables or views:

    select title, content from article;

    In such cases, we expect the user to define the schema of this query's return type:

    title:
      type: text
      nullable: true
    content:
      type: text
      nullable: true
  6. Support for permissions:

    1. We want to allow similar permissions like on tables/views.

      1. The columns that can be selected
      2. The rows that can be selected
      3. Whether the _aggregate should be generated
    2. Support for presetting parameters in a query to either static values or
      values from session variables (like in insert and update permissions).

Product Specification

(See the Discussion section for other approaches considered, the following is
based off suggestions made by Joe and Phil on the PR)

The biggest challenge is (5) - letting parameterized queries return table like
data whose schema is not captured by any of the existing tables/views that are
defined in the database. To address this, we introduce the notion of 'virtual
tables' which lets one define the schema of table data at graphql-engine's
layer. For example,

version:3
sources:
- name: default
  kind: mssql
  configuration: ..
  tables: ...
  virtual_tables:
  - name: average_rating
    fields:
    - name: num_articles
      type: int
    - name: average_rating
      type: float

defines a virtual table average_rating with fields name and average_rating.

Parameterized queries are defined per source. The return type of a
parameterized query can be one or more rows of a database table or a virtual
table. For example,

version: 3
sources:
- name: default
  kind: mssql
  configuration: ...
  tables:
  - table: article
  parameterized_queries:
  - name: get_article
    description: Get an article with an id
    query: |
      select * from articles where id = $1
    parameters:
    - param: $1
      name: article_id
      type: uuid
      optional: false
    response:
      return_type:
        table: article
        cardinality: one

defines a parameterized query get_article which returns a single row of type
article.

The following example defines a parameterized query author_ratings which
returns zero or many rows of the virtual table average_rating.

version: 3
sources:
- name: source_name
  kind: mssql
  configuration: ...
  virtual_tables:
  - name: average_rating
    fields:
    - name: num_articles
      type: int
    - name: average_rating
      type: float
  parameterized_queries:
  - name: author_ratings
    description: Ratings of articles of various authors
    query: |
      select author_id, avg(rating) as average_rating
      from article group_by author_id
    parameters: []
    response:
      return_type:
        virtual_table: average_rating
        cardinality: many

Virtual tables metadata

Virtual tables are defined under a field called virtual_tables under each
source.

virtual_tables:
  # name of the virtual table, should be unique in the source
  # this will also be used for the GraphQL type name
- name: name_of_the_virtual_table

  # fields of the virtual table
  fields:
    # name of the field
  - name: f1
    # the database type of the field
    type: database_scalar_type
    # whether a query which returns this type would return nulls for this
    # field
    nullable: true

Parameterized queries metadata

Parameterized queres are defined under a field called parameterized_queries
under each source.

version: 3
sources:
- name: default
  kind: postgres
  configuration: ...
  parameterized_queries:
  - ..
  - ..

The schema of each paramaterized query is defined as follows:

# The name of the parameterized query, which will also be the name
# of the field name that exposes this query in GraphQL API

name: author_ratings

# A description of this field, that will be published in the
# GraphQL API

comment: Ratings of articles of various authors

# A parameterized SQL query where parameters start with ':'
# See https://docs.sqlalchemy.org/en/13/core/tutorial.html#using-textual-sql
query: |
  select * from x where c1 = :c1

# additional information about the parameters that are defined in the query
parameters:

  # name of the parameter
- name: c1

  # the type of the parameter: database scalar type
  type: database_scalar_type

  # (optional) whether the parameter is required or not - shows up in the
  # GraphQL schema for the argument. `true` by default
  required: true_or_false

  # (optional) a default value for the parameter if not provided in the query
  default: default_value_for_the_scalar

# return type of the parameterized query
return_type:

  # can be one of virtual_table or table
  table: name_of_the_table
  virtual_table: name_of_the_virtual_table

  # whether the query returns a zero or one row
  # or whether it returns zero or many rows
  # maybe we can come up with a better name than cardinality?
  cardinality: one_or_many

Metadata

Metadata

Assignees

Labels

k/enhancementNew feature or improve an existing featurek/ideasDiscuss new ideas / pre-proposals / roadmapk/rfcRFC for a new feature / process

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions