这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@lorenzo
Copy link
Contributor

@lorenzo lorenzo commented Aug 12, 2019

Description

  • Listens for SIGINT as the termination signal
  • Stops accepting new connections once the signal is received
  • Waits for all connections to be drained, before shutting down
  • Forcefully kills all pending connections after 30 seconds

Currently this does not send a close message to websocket clients, I'd
like to submit that change as a separate pull request, but at least this
solve my biggest concern which is not getting confirmation for mutations
while restarting the server.

Affected components

  • Server

Related Issues

refs #2698

Steps to test and verify

I haven't codified this as a test, although I'm happy to provide it if you guide me in running the test suite and your preferred solution. I did this:

Save this to a test.sh file

curl 'http://localhost:8080/v1/graphql' \
-XPOST \
-H 'Content-Type: application/json' \
-H 'Pragma: no-cache' \
-H 'Connection: keep-alive' \
--data-binary $VERY_SLOW_QUERY -v -s > /dev/null

Then watched it:

watch -n 0.8 ./test.sh

Waited for a few requests to go through then killed the server:

kill -s INT $PID

Verified that no broken socket errors occur, only the final connection refused

Limitations, known bugs & workarounds

This currently does not deal with gracefully shutting down websocket connections

* Listens for SIGINT as the termination signal
* Stops accepting new connections once the signal is received
* Waits for all connections to be drained, before shutting down
* Forcefully kills all pending connections after 30 seconds

Currently this does not send a close message to websocket clients, I'd
like to submit that change as a separate pull request, but at least this
solve my biggest concern which is not getting confirmation for mutations
while restarting the server.
@lorenzo lorenzo requested a review from lexi-lambda as a code owner August 12, 2019 11:35
@netlify
Copy link

netlify bot commented Aug 12, 2019

Deploy preview for hasura-docs ready!

Built with commit f32252e

https://deploy-preview-2717--hasura-docs.netlify.com

@hasura-bot
Copy link
Contributor

Review app for commit 52d1659 deployed to Heroku: https://hge-ci-pull-2717.herokuapp.com
Docker image for server: hasura/graphql-engine:pull2717-52d16598

@marionschleifer marionschleifer added the c/community Related to community content label Aug 12, 2019
@hasura-bot
Copy link
Contributor

Review app for commit 61f612a deployed to Heroku: https://hge-ci-pull-2717.herokuapp.com
Docker image for server: hasura/graphql-engine:pull2717-61f612a5

@jberryman
Copy link
Collaborator

Thanks for the contribution! The easiest way to run the integration tests is to do

$ scripts/dev.sh test

You can also run specific tests with -k , e.g.

$ scripts/dev.sh test -k "test_jsonb_has_all[http]"

The script is very new so do let us know if it works on your machine.

@lorenzo
Copy link
Contributor Author

lorenzo commented Aug 13, 2019

@jberryman It works on my machine, although I consistently get 2 failures in the master branch.

The test I would need to write requires that I kill the Hasura server, which is handled in the shell file and not in the python test harness. How do you suggest I implement this? The natural way would be to have the start/stop function in the python code and be able to start or stop the server per test if required.

@jberryman
Copy link
Collaborator

@lorenzo I appreciate you trying out the script.

I think it's not terribly important to have an automated test for this behavior right now, but I'll let @lexi-lambda chime in with her review. The script you've posted is super helpful though thank you!

Tested locally and change LGTM, didn't look closely at the failure we're getting on CI ^

@hasura-bot
Copy link
Contributor

Review app for commit 1c53104 deployed to Heroku: https://hge-ci-pull-2717.herokuapp.com
Docker image for server: hasura/graphql-engine:pull2717-1c531040

@hasura-bot
Copy link
Contributor

Review app for commit 48c6464 deployed to Heroku: https://hge-ci-pull-2717.herokuapp.com
Docker image for server: hasura/graphql-engine:pull2717-48c64647

@lexi-lambda
Copy link
Contributor

@lorenzo This looks good to me, but I have one question: given you’re listening on an arbitrary signal, anyway, why not just listen for SIGTERM instead of SIGINT and remove the Dockerfile change?

@lorenzo
Copy link
Contributor Author

lorenzo commented Aug 16, 2019

@lexi-lambda I’m fine with that too, just did not want to break backwards compatibility, since the only thing that actually stops the docker container, even before my change, is a SiGINT. I’m fine we removing that line in this PR if you think we should go that way.

lorenzo and others added 2 commits August 20, 2019 12:07
Added a helpful log line to the graceful shutdown sequence
@lorenzo
Copy link
Contributor Author

lorenzo commented Aug 20, 2019

@lexi-lambda I change the signal to SIGTERM I could not see a real benefit in using a different one, after all

@hasura-bot
Copy link
Contributor

Review app for commit 8efb539 deployed to Heroku: https://hge-ci-pull-2717.herokuapp.com
Docker image for server: hasura/graphql-engine:pull2717-8efb539b

@hasura-bot
Copy link
Contributor

Review app for commit 4427a25 deployed to Heroku: https://hge-ci-pull-2717.herokuapp.com
Docker image for server: hasura/graphql-engine:pull2717-4427a259

lexi-lambda
lexi-lambda previously approved these changes Aug 22, 2019
Copy link
Contributor

@lexi-lambda lexi-lambda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes seem fine to me. I have a slightly more elaborate approach I’ve been playing with on a branch involving converting SIGTERM signals to asynchronous exceptions (in the same way SIGINT signals are already converted by the RTS to UserInterrupt exceptions), so I might open a PR with that at some point, but I don’t know if there’s actually much of an advantage to it at the moment, anyway.

Thank you for your contribution!

@lorenzo
Copy link
Contributor Author

lorenzo commented Aug 22, 2019

@lexi-lambda interesting, what would the advantage of your approach be?

@lexi-lambda
Copy link
Contributor

what would the advantage of your approach be?

Mostly just consistency with what GHC already does. Originally I was thinking it might make it easier to implement graceful shutdown for things running on several different threads, but that isn’t actually true, since GHC only throws UserInterrupt to the main thread, anyway. It’s also very slightly simpler to handle both SIGTERM and SIGINT together, since you can just catch either exception. It’s not a big difference, though.

Anyway, it seems like, interestingly enough, these changes make the CLI tests fail! It looks like, for some reason or another, the graceful shutdown on SIGTERM causes the working tree to become dirty when it didn’t before. I am not immediately sure why that is the case.

@hasura-bot
Copy link
Contributor

Review app for commit c23e58e deployed to Heroku: https://hge-ci-pull-2717.herokuapp.com
Docker image for server: hasura/graphql-engine:pull2717-c23e58ec

@lorenzo
Copy link
Contributor Author

lorenzo commented Aug 22, 2019

@lexi-lambda I think it is trivial with this code to install the same signal handler for SIGINT as well. Do you want me to add thet

@lexi-lambda
Copy link
Contributor

If you would like, but there’s no need to.

@hasura-bot
Copy link
Contributor

Review app for commit 7ddcca1 deployed to Heroku: https://hge-ci-pull-2717.herokuapp.com
Docker image for server: hasura/graphql-engine:pull2717-7ddcca1e

@lorenzo
Copy link
Contributor Author

lorenzo commented Aug 22, 2019

@lexi-lambda I used my very limited knowledge on what I thought could be wrong in the tests, but I truly don't know what it is :(

@lorenzo
Copy link
Contributor Author

lorenzo commented Aug 23, 2019

@lexi-lambda I finally managed to make the test script pass

Copy link
Contributor

@lexi-lambda lexi-lambda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it was .tix files, that makes sense. Good catch.

@hasura-bot
Copy link
Contributor

Review app for commit e79d3b9 deployed to Heroku: https://hge-ci-pull-2717.herokuapp.com
Docker image for server: hasura/graphql-engine:pull2717-e79d3b91

@hasura-bot
Copy link
Contributor

Review app for commit f32252e deployed to Heroku: https://hge-ci-pull-2717.herokuapp.com
Docker image for server: hasura/graphql-engine:pull2717-f32252ea

Copy link
Member

@shahidhk shahidhk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@lexi-lambda lexi-lambda merged commit c7a2320 into hasura:master Aug 26, 2019
@hasura-bot
Copy link
Contributor

Review app https://hge-ci-pull-2717.herokuapp.com is deleted

@lorenzo lorenzo deleted the graceful-shutdown branch August 26, 2019 06:06
polRk pushed a commit to polRk/graphql-engine that referenced this pull request Feb 12, 2020
…asura#2717)

* Listens for SIGTERM as the termination signal
* Stops accepting new connections once the signal is received
* Waits for all connections to be drained, before shutting down
* Forcefully kills all pending connections after 30 seconds

Currently this does not send a close message to websocket clients, I'd
like to submit that change as a separate pull request, but at least this
solve my biggest concern which is not getting confirmation for mutations
while restarting the server.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c/community Related to community content

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants