-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Make sure deployments never underscale on rollouts #11140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure deployments never underscale on rollouts #11140
Conversation
Codecov Report
@@ Coverage Diff @@
## main #11140 +/- ##
==========================================
+ Coverage 87.69% 87.75% +0.06%
==========================================
Files 191 190 -1
Lines 9193 9173 -20
==========================================
- Hits 8062 8050 -12
+ Misses 879 869 -10
- Partials 252 254 +2
Continue to review full report at Codecov.
|
/test bla |
@markusthoemmes: The specified target(s) for
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test pull-knative-serving-upgrade-tests |
@@ -217,6 +219,13 @@ var ( | |||
}, | |||
}, | |||
ProgressDeadlineSeconds: ptr.Int32(0), | |||
Strategy: appsv1.DeploymentStrategy{ | |||
Type: appsv1.RollingUpdateDeploymentStrategyType, | |||
RollingUpdate: &appsv1.RollingUpdateDeployment{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually discussed that with Matt some 2 years ago, though, I actually wanted to use more aggressive settings :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More aggressive in which ways?
/test pull-knative-serving-upgrade-tests |
Sometimes we DO update the revision deployment, for example for config changes and for updates to the queue-proxy. We want those updates to have as little interruption as possible, so this requires the pods to be rolled in a way that there is never less pods than requested.
e2cb911
to
ef3a40d
Compare
/unhold |
/assign @julz @vagababov @dprotaso |
I'm pretty sure this is fine and what we should do, but can we quantify (maybe a quick experiment to see?) what this will mean in terms of number of pods before/after the change for users/operators. Especially in contexts where you're charging Actual Money per replica (but even if not and it's just how many extra nodes you need during upgrade) that seems like important info. tl;dr is the result here that we end up with one or two more pods during deploy, or is it that we end up with a significant fraction of the total extra while the deployment rolls out? |
@julz see the comment on that in the intro. For the K8s rollout mechanism, Terminating is equal to "Gone". Since our pods take like 40s to completely shut down, you're looking at doubling the amount of pods in the cluster for any rollout taking <40s. That is also true for today's behavior. The only tangible diff is, with this in place, the rollout will start by adding containers and only start removing once those added ones are ready. If you're charging for Terminating pods as well, while they are terminating, I don't think anything changes substantially. If you're only charging for ready pods, you might see 1 or 2 extra pods in this case. |
Or to put more simply: With this, you're tracking 0/+2 extra ready pods, whereas before we've been tracking roughly -1/+1 ready pods. |
that's the line I was looking for 👍 |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dprotaso The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hello, @markusthoemmes What ways do we offer to update to the queue-proxy? use "kubectl edit deploy"? |
Proposed Changes
Ref #11092
Sometimes we DO update the revision deployment, for example for config
changes and for updates to the queue-proxy. We want those updates to
have as little interruption as possible, so this requires the pods to be
rolled in a way that there is never less pods than requested.
Context on the rollout speed: This does not significantly impact the speed at which a rollout is done. As soon as the pods are Terminating, the next batch will be rolled, so this just makes sure we never dip below the
replicas
setting of the Deployment. Since our pods take ~40s to shut down anyway, this likely isn't even impacting the amount of resources used by the rollout significantly.Release Note