这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@paf31
Copy link
Contributor

@paf31 paf31 commented Jul 28, 2020

This gives event trigger locks a 30 minute timeout. In case the graceful shutdown of a server fails, this will automatically unlock in-process rows after 30 minutes at most.

This PR is a work in progress and I haven't had chance to test it extensively yet, but we can use this as a place to discuss the approach.

Changelog

  • CHANGELOG.md is updated with user-facing content relevant to this PR. If no changelog is required, then add the no-changelog-required label.

Affected components

  • Server

Related Issues

Solution and Design

Steps to test and verify

Limitations, known bugs & workarounds

Server checklist

Catalog upgrade

Does this PR change Hasura Catalog version?

  • No
  • Yes
    • Updated docs with SQL for downgrading the catalog

Metadata

Does this PR add a new Metadata feature?

  • No

GraphQL

  • No new GraphQL schema is generated

Breaking changes

  • There are breaking changes:

This changes the type of the locked column to be a timestamp null instead of a boolean not null.

@hasura-bot
Copy link
Contributor

Review app for commit fe70d9f deployed to Heroku: https://hge-ci-pull-5473.herokuapp.com
Docker image for server: hasura/graphql-engine:pull5473-fe70d9fb

Copy link
Collaborator

@jberryman jberryman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

paf31 and others added 2 commits July 28, 2020 16:11
@hasura-bot
Copy link
Contributor

Review app for commit ca190ef deployed to Heroku: https://hge-ci-pull-5473.herokuapp.com
Docker image for server: hasura/graphql-engine:pull5473-ca190ef5

@tirumaraiselvan
Copy link
Contributor

@paf31 IIUC, we can now get rid of this?

prepareEvents _icPgPool logger

created_at TIMESTAMP DEFAULT NOW(),
locked BOOLEAN NOT NULL DEFAULT FALSE,
/* when locked IS NULL the event is unlocked and can be processed */
locked TIMESTAMP,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this will affect the behaviour, but I think it will be a good idea to use TIMESTAMPZ instead of TIMESTAMP because now() returns a TIMESTAMPZ and there may be issues when comparing a TIMESTAMPZ value with a TIMESTAMP value

@paf31
Copy link
Contributor Author

paf31 commented Jul 29, 2020

@paf31 IIUC, we can now get rid of this?

prepareEvents _icPgPool logger

That code unlocks all events if I understand correctly. I think it might be good to keep it, so that we have a way to quickly unlock everything without having to wait 30 minutes.

@hasura-bot
Copy link
Contributor

Review app for commit c93a00b deployed to Heroku: https://hge-ci-pull-5473.herokuapp.com
Docker image for server: hasura/graphql-engine:pull5473-c93a00b3

@paf31 paf31 marked this pull request as ready for review July 29, 2020 21:14
@paf31 paf31 requested a review from 0x777 July 29, 2020 21:14
@hasura-bot
Copy link
Contributor

Review app for commit 8cc548a deployed to Heroku: https://hge-ci-pull-5473.herokuapp.com
Docker image for server: hasura/graphql-engine:pull5473-8cc548ab

@hasura-bot
Copy link
Contributor

Review app for commit ab73923 deployed to Heroku: https://hge-ci-pull-5473.herokuapp.com
Docker image for server: hasura/graphql-engine:pull5473-ab73923a

@tirumaraiselvan
Copy link
Contributor

That code unlocks all events if I understand correctly. I think it might be good to keep it, so that we have a way to quickly unlock everything without having to wait 30 minutes.

We are tracking events which are in-memory and unlocking them during graceful shutdown, right?

This way a proper server restart should not wait 30 mins to retry a locked event.

Shouldn't this cover most cases? Otherwise, I am not really sure what is the ROI of changing the locked column from bool to timestamp. We are managing the same amount of state as before?

@jberryman
Copy link
Collaborator

Mm, yeah I agree we shouldn't be doing unlockAllEvents on startup anymore. Formerly I guess that was the only way to ensure at-least-once delivery in the case of a hard shutdown. But now that case is taken care of by having the timeout state in durable storage.

While this is still open, do we want to think more about the timeout for (re)delivery? Maybe it should be much shorter or be relative to the http client timeout value we use? Just a detail.

@hasura-bot
Copy link
Contributor

Review app for commit 5dee537 deployed to Heroku: https://hge-ci-pull-5473.herokuapp.com
Docker image for server: hasura/graphql-engine:pull5473-5dee5373

@paf31
Copy link
Contributor Author

paf31 commented Jul 30, 2020

While this is still open, do we want to think more about the timeout for (re)delivery?

Definitely. I wasn't sure about this so I just did the most naive thing possible to start with. Any ideas?

@jberryman
Copy link
Collaborator

Any ideas?

I'm not sure, and setting it too low is the real danger (sort of slow thundering herd situation... lumbering herd). Seems good for now

@tirumaraiselvan tirumaraiselvan added this to the v1.4 milestone Aug 6, 2020
@hasura-bot
Copy link
Contributor

Review app for commit b3856d8 deployed to Heroku: https://hge-ci-pull-5473.herokuapp.com
Docker image for server: hasura/graphql-engine:pull5473-b3856d82

@jberryman
Copy link
Collaborator

(sorry if this was awaiting another approval by me)

@paf31
Copy link
Contributor Author

paf31 commented Aug 12, 2020

(sorry if this was awaiting another approval by me)

Sorry, it may have, but I'm also being a little hesitant to merge this, since it is a little difficult to test and be sure that it is the best change, for arguably little benefit. Since the original problem is (I think) now addressed, I'm thinking we can hold off until I can write some tests for this. What do you think?

@hasura-bot
Copy link
Contributor

Review app for commit 4cc7394 deployed to Heroku: https://hge-ci-pull-5473.herokuapp.com
Docker image for server: hasura/graphql-engine:pull5473-4cc73949

@tirumaraiselvan
Copy link
Contributor

I have marked this for v1.4 release since we get a beta period to test this change.

Copy link
Contributor

@codingkarthik codingkarthik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jberryman
Copy link
Collaborator

(FWIW if you want to try using kodiak for auto-merge/update you can add the auto-update-auto-merge tag)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants