这是indexloc提供的服务,不要输入任何密码
Skip to content

Make sure deployments never underscale on rollouts #11140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 28, 2021

Conversation

markusthoemmes
Copy link
Contributor

@markusthoemmes markusthoemmes commented Apr 8, 2021

Proposed Changes

Ref #11092

Sometimes we DO update the revision deployment, for example for config
changes and for updates to the queue-proxy. We want those updates to
have as little interruption as possible, so this requires the pods to be
rolled in a way that there is never less pods than requested.

Context on the rollout speed: This does not significantly impact the speed at which a rollout is done. As soon as the pods are Terminating, the next batch will be rolled, so this just makes sure we never dip below the replicas setting of the Deployment. Since our pods take ~40s to shut down anyway, this likely isn't even impacting the amount of resources used by the rollout significantly.

Release Note

Changed the rollout behavior of application deployment changes (due to Knative upgrade for example) to never have less ready posd than required.

@knative-prow-robot knative-prow-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 8, 2021
@google-cla google-cla bot added the cla: yes Indicates the PR's author has signed the CLA. label Apr 8, 2021
@knative-prow-robot knative-prow-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. area/API API objects and controllers labels Apr 8, 2021
@codecov
Copy link

codecov bot commented Apr 8, 2021

Codecov Report

Merging #11140 (e2cb911) into main (e57b86b) will increase coverage by 0.06%.
The diff coverage is 100.00%.

❗ Current head e2cb911 differs from pull request most recent head ef3a40d. Consider uploading reports for the commit ef3a40d to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main   #11140      +/-   ##
==========================================
+ Coverage   87.69%   87.75%   +0.06%     
==========================================
  Files         191      190       -1     
  Lines        9193     9173      -20     
==========================================
- Hits         8062     8050      -12     
+ Misses        879      869      -10     
- Partials      252      254       +2     
Impacted Files Coverage Δ
pkg/reconciler/revision/resources/deploy.go 100.00% <100.00%> (ø)
pkg/autoscaler/metrics/http_scrape_client.go 84.61% <0.00%> (-2.89%) ⬇️
pkg/reconciler/revision/reconcile_resources.go 80.95% <0.00%> (-2.39%) ⬇️
pkg/reconciler/configuration/configuration.go 86.71% <0.00%> (-1.57%) ⬇️
pkg/activator/net/revision_backends.go 91.32% <0.00%> (-0.70%) ⬇️
pkg/activator/handler/handler.go 95.12% <0.00%> (-0.54%) ⬇️
pkg/autoscaler/metrics/stats_scraper.go 91.44% <0.00%> (-0.33%) ⬇️
cmd/activator/main.go 0.00% <0.00%> (ø)
pkg/reconciler/revision/config/store.go 100.00% <0.00%> (ø)
pkg/reconciler/autoscaling/config/store.go 100.00% <0.00%> (ø)
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e57b86b...ef3a40d. Read the comment docs.

@markusthoemmes
Copy link
Contributor Author

/test bla

@knative-prow-robot
Copy link
Contributor

@markusthoemmes: The specified target(s) for /test were not found.
The following commands are available to trigger jobs:

  • /test pull-knative-serving-build-tests
  • /test pull-knative-serving-unit-tests
  • /test pull-knative-serving-upgrade-tests
  • /test pull-knative-serving-istio-latest-mesh
  • /test pull-knative-serving-istio-latest-mesh-tls
  • /test pull-knative-serving-istio-latest-no-mesh
  • /test pull-knative-serving-istio-latest-no-mesh-tls
  • /test pull-knative-serving-istio-stable-mesh
  • /test pull-knative-serving-istio-stable-mesh-tls
  • /test pull-knative-serving-istio-stable-no-mesh
  • /test pull-knative-serving-istio-stable-no-mesh-tls
  • /test pull-knative-serving-istio-stable-no-mesh-tls-with-http01
  • /test pull-knative-serving-kourier-stable
  • /test pull-knative-serving-kourier-stable-tls
  • /test pull-knative-serving-contour-latest
  • /test pull-knative-serving-contour-tls
  • /test pull-knative-serving-ambassador-latest
  • /test pull-knative-serving-ambassador-latest-tls
  • /test pull-knative-serving-kong-latest
  • /test pull-knative-serving-kong-latest-tls
  • /test pull-knative-serving-https

Use /test all to run the following jobs:

  • pull-knative-serving-build-tests
  • pull-knative-serving-unit-tests
  • pull-knative-serving-upgrade-tests
  • pull-knative-serving-istio-stable-no-mesh
  • pull-knative-serving-istio-stable-no-mesh-tls

In response to this:

/test bla

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@markusthoemmes
Copy link
Contributor Author

/test pull-knative-serving-upgrade-tests

@@ -217,6 +219,13 @@ var (
},
},
ProgressDeadlineSeconds: ptr.Int32(0),
Strategy: appsv1.DeploymentStrategy{
Type: appsv1.RollingUpdateDeploymentStrategyType,
RollingUpdate: &appsv1.RollingUpdateDeployment{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually discussed that with Matt some 2 years ago, though, I actually wanted to use more aggressive settings :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More aggressive in which ways?

@markusthoemmes
Copy link
Contributor Author

/test pull-knative-serving-upgrade-tests

Sometimes we DO update the revision deployment, for example for config
changes and for updates to the queue-proxy. We want those updates to
have as little interruption as possible, so this requires the pods to be
rolled in a way that there is never less pods than requested.
@markusthoemmes
Copy link
Contributor Author

/unhold

@knative-prow-robot knative-prow-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 14, 2021
@markusthoemmes
Copy link
Contributor Author

/assign @julz @vagababov @dprotaso

@julz
Copy link
Member

julz commented Apr 20, 2021

I'm pretty sure this is fine and what we should do, but can we quantify (maybe a quick experiment to see?) what this will mean in terms of number of pods before/after the change for users/operators. Especially in contexts where you're charging Actual Money per replica (but even if not and it's just how many extra nodes you need during upgrade) that seems like important info.

tl;dr is the result here that we end up with one or two more pods during deploy, or is it that we end up with a significant fraction of the total extra while the deployment rolls out?

@markusthoemmes
Copy link
Contributor Author

markusthoemmes commented Apr 20, 2021

@julz see the comment on that in the intro. For the K8s rollout mechanism, Terminating is equal to "Gone". Since our pods take like 40s to completely shut down, you're looking at doubling the amount of pods in the cluster for any rollout taking <40s. That is also true for today's behavior.

The only tangible diff is, with this in place, the rollout will start by adding containers and only start removing once those added ones are ready.

If you're charging for Terminating pods as well, while they are terminating, I don't think anything changes substantially. If you're only charging for ready pods, you might see 1 or 2 extra pods in this case.

@markusthoemmes markusthoemmes changed the title Use the most defensive rollout strategy for revision updates Make sure deployments never underscale on rollouts Apr 20, 2021
@markusthoemmes
Copy link
Contributor Author

markusthoemmes commented Apr 20, 2021

Or to put more simply: With this, you're tracking 0/+2 extra ready pods, whereas before we've been tracking roughly -1/+1 ready pods.

@julz
Copy link
Member

julz commented Apr 20, 2021

With this, you're tracking 0/+2 extra ready pods, whereas before we've been tracking roughly -1/+1 ready pods.

that's the line I was looking for 👍

@dprotaso
Copy link
Member

/lgtm
/approve

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 28, 2021
@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dprotaso

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 28, 2021
@knative-prow-robot knative-prow-robot merged commit 5af4e84 into knative:main Apr 28, 2021
@myrat92
Copy link

myrat92 commented Sep 23, 2021

Hello, @markusthoemmes What ways do we offer to update to the queue-proxy? use "kubectl edit deploy"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/API API objects and controllers cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants