这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@jberryman
Copy link
Collaborator

@jberryman jberryman commented Jan 3, 2020

We upload a set of accumulating timers and counters to track service
time for different types of operations, across several dimensions (e.g.
did we hit the plan cache, was a remote involved, etc.)

Description

Affected components

  • Server
  • Tests

Related Issues

#3552 although metrics should hopefully give us general visibility into performance

Solution and Design

Steps to test and verify

run tests

Limitations, known bugs & workarounds

  • marked an unsafe forked thread as TODO immortal, but we should fix all of these together (see Audit uses of forkIO #3768 )
  • I didn't document changes in telemetry.rst; that doc is already stale. It would be nice if we could generate it from tests

Server checklist

Catalog upgrade

Does this PR change Hasura Catalog version?

  • No
  • Yes
    • Updated docs with SQL for downgrading the catalog

Metadata

Does this PR add a new Metadata feature?

  • No
  • Yes
    • Does run_sql auto manages the new metadata through schema diffing?
      • Yes
      • Not required
    • Does run_sql auto manages the definitions of metadata on renaming?
      • Yes
      • Not required
    • Does export_metadata/replace_metadata supports the new metadata added?
      • Yes
      • Not required

GraphQL

  • No new GraphQL schema is generated
  • New GraphQL schema is being generated:
    • New types and typenames are correlated

Breaking changes

  • No Breaking changes

  • There are breaking changes:

    1. Metadata API

      Existing query types:

      • Modify args payload which is not backward compatible
      • Behavioural change of the API
      • Change in response JSON schema
      • Change in error code
    2. GraphQL API

      Schema Generation:

      • Change in any NamedType
      • Change in table field names

      Schema Resolve:-

      • Change in treatment of null value for any input fields
    3. Logging

      • Log JSON schema has changed
      • Log type names have changed

@jberryman jberryman requested a review from lexi-lambda as a code owner January 3, 2020 05:49
@netlify
Copy link

netlify bot commented Jan 3, 2020

Deploy preview for hasura-docs ready!

Built with commit ee474ac

https://deploy-preview-3622--hasura-docs.netlify.com

@hasura-bot
Copy link
Contributor

Review app for commit 5fd892f deployed to Heroku: https://hge-ci-pull-3622.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3622-5fd892f7-dirty

@jberryman jberryman force-pushed the 3552-gather-usage-telemetry-for-cache branch from 5fd892f to c8d913f Compare January 6, 2020 19:13
@jberryman jberryman requested a review from 0x777 January 6, 2020 19:15
@jberryman
Copy link
Collaborator Author

@0x777
I decided to base the resolution to #3547 on the work mentioned in the body of this PR to use some of the timing utilities. Can you please review 29b2963 ?

@hasura-bot
Copy link
Contributor

Review app for commit c8d913f deployed to Heroku: https://hge-ci-pull-3622.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3622-c8d913f8-dirty

@jberryman jberryman force-pushed the 3552-gather-usage-telemetry-for-cache branch from c8d913f to fdb4918 Compare January 17, 2020 00:37
@hasura-bot
Copy link
Contributor

Review app for commit fdb4918 deployed to Heroku: https://hge-ci-pull-3622.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3622-fdb49182-dirty

@jberryman jberryman force-pushed the 3552-gather-usage-telemetry-for-cache branch from fdb4918 to ee474ac Compare January 17, 2020 03:45
@hasura-bot
Copy link
Contributor

Review app for commit ee474ac deployed to Heroku: https://hge-ci-pull-3622.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3622-ee474ac0-dirty

@jberryman jberryman force-pushed the 3552-gather-usage-telemetry-for-cache branch from ee474ac to 484ad79 Compare January 21, 2020 14:24
@netlify
Copy link

netlify bot commented Jan 21, 2020

Deploy preview for hasura-docs ready!

Built with commit 484ad79

https://deploy-preview-3622--hasura-docs.netlify.com

@jberryman
Copy link
Collaborator Author

@lexi-lambda @0x777 this is ready for review again

Copy link
Contributor

@lexi-lambda lexi-lambda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve left a few small comments, but I’m going to defer to @0x777 to review the bulk of this change, since I think he has more context on it than I do, anyway (and I have plenty of other things to look at).

@hasura-bot
Copy link
Contributor

Review app for commit 484ad79 deployed to Heroku: https://hge-ci-pull-3622.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3622-484ad79c

0x777
0x777 previously approved these changes Jan 23, 2020
@jberryman jberryman force-pushed the 3552-gather-usage-telemetry-for-cache branch from 484ad79 to 94719b2 Compare January 23, 2020 17:29
@jberryman
Copy link
Collaborator Author

@lexi-lambda I think this should be ready now.

@jberryman jberryman requested a review from lexi-lambda January 27, 2020 15:37
@lexi-lambda lexi-lambda mentioned this pull request Jan 28, 2020
33 tasks
@jberryman jberryman force-pushed the 3552-gather-usage-telemetry-for-cache branch from 94719b2 to ef79555 Compare January 30, 2020 18:26
@hasura-bot
Copy link
Contributor

Review app for commit ef79555 deployed to Heroku: https://hge-ci-pull-3622.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3622-ef795559


-- | 'RequestTimings' along with the count
data RequestTimingsCount =
RequestTimingsCount {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current model does not seem to enable the ability to calculate percentiles, which are extremely useful for performance tracking. Would you guys be open to explore an alternative design that would enable calculations beyond totals and averages?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now for the purposes of telemetry we're just collecting counts and timings for a few hand chosen buckets, which should be fine for our purposes, rather than using hdrhistogram or something similar. But I agree with you that percentiles are useful!

@jberryman
Copy link
Collaborator Author

jberryman commented Jan 30, 2020

@0x777 or @lexi-lambda

This should address Alexis's latest requested changes (use DiffTime instead of NominalDiffTime). History got somewhat worse as I didn't want to spend the time to again split up all the changes and squash them individually. I also added a few comments, added a fromUnits for conversions between units (including difftime and nominaldifftime), as I discovered realToFrac was a footgun.

0x777
0x777 previously approved these changes Jan 31, 2020
Copy link
Member

@0x777 0x777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hasura-bot
Copy link
Contributor

Review app for commit 2d95118 deployed to Heroku: https://hge-ci-pull-3622.herokuapp.com
Docker image for server: hasura/graphql-engine:pull3622-2d951189

…3552

We upload a set of accumulating timers and counters to track service
time for different types of operations, across several dimensions (e.g.
did we hit the plan cache, was a remote involved, etc.)

Also...

Standardize on DiffTime as a standard duration type, and try to use it
consistently.

See discussion here:
hasura#3584 (review)

It should be possible to overwrite that module so the new threadDelay
sticks per the pattern in hasura#3705 blocked on hasura#3558

Rename the Control.Concurrent.Extended.threadDelay to `sleep` since a
naive use with a literal argument would be very bad!

We catch a bug in 'computeTimeDiff'.

Add convenient 'Read' instances to the time unit utility types. Make
'Second' a newtype to support this.
Add a new 'request_read_time' to the logging record.

We also use 'withElapsedTime' to get proper monotonic clock for
'query_execution_time' as well.
@lexi-lambda lexi-lambda force-pushed the 3552-gather-usage-telemetry-for-cache branch from 2d95118 to b863dc2 Compare February 4, 2020 00:50
@lexi-lambda lexi-lambda merged commit c90c75f into hasura:master Feb 4, 2020
@hasura-bot
Copy link
Contributor

Review app https://hge-ci-pull-3622.herokuapp.com is deleted

@jberryman jberryman deleted the 3552-gather-usage-telemetry-for-cache branch February 4, 2020 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants