这是indexloc提供的服务,不要输入任何密码
Skip to content

Disable activator probe optimisation when exec probe is specified #12250

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 12, 2021

Conversation

julz
Copy link
Member

@julz julz commented Nov 10, 2021

Fixes #11693.

The activator probe optimisation relies on Queue Proxy being able to perform the
required probes, but in the case of exec probes it can't. This means
that activator can start proxying to pods before they're ready if an
exec probe is used. This commit disables the optimisation and relies on
the kubernetes readiness signal instead if an exec probe is used.

Note: because we also use the probe for autodetecting mesh mode (unless
explicitly told whether mesh mode is on in the config) we still do probe
even if an exec probe is passed, we just also require kubernetes to
report ready.

Release Note

The activator optimisation which directly probes the queue proxy for readiness rather than waiting for Kubernetes to report readiness is now disabled when exec probes are used (since queue proxy cannot execute these probes on the user container's behalf).

The activator probe relies on Queue Proxy being able to perform the
required probes, but in the case of exec probes it can't. This means
that activator can start proxying to pods before they're ready if an
exec probe is used. This commit disables the optimisation and relies on
the kubernetes readiness signal instead if an exec probe is used.

Note: because we also use the probe for autodetecting mesh mode (unless
explicitly told whether mesh mode is on in the config) we still do probe
even if an exec probe is passed, we just _also_ require kubernetes to
report ready.
@knative-prow-robot knative-prow-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 10, 2021
@google-cla google-cla bot added the cla: yes Indicates the PR's author has signed the CLA. label Nov 10, 2021
@knative-prow-robot knative-prow-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. area/autoscale area/networking labels Nov 10, 2021
@codecov
Copy link

codecov bot commented Nov 10, 2021

Codecov Report

Merging #12250 (6a632a6) into main (fdaccb7) will decrease coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main   #12250      +/-   ##
==========================================
- Coverage   87.32%   87.30%   -0.03%     
==========================================
  Files         195      195              
  Lines        9584     9600      +16     
==========================================
+ Hits         8369     8381      +12     
- Misses        936      939       +3     
- Partials      279      280       +1     
Impacted Files Coverage Δ
pkg/activator/net/revision_backends.go 91.73% <100.00%> (ø)
pkg/apis/config/defaults.go 87.83% <0.00%> (-4.10%) ⬇️
pkg/apis/serving/v1/revision_defaults.go 97.22% <0.00%> (+0.16%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d870410...6a632a6. Read the comment docs.

@julz julz changed the title WIP: Disable activator probe optimisation when exec probe is specified Disable activator probe optimisation when exec probe is specified Nov 10, 2021
@knative-prow-robot knative-prow-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 10, 2021
@julz
Copy link
Member Author

julz commented Nov 10, 2021

/assign @markusthoemmes @psschwei @dprotaso

Copy link
Contributor

@psschwei psschwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, but I'll give @markusthoemmes and/or @dprotaso a chance to weigh in (though I guess Markus already did in the issue...)

@@ -253,7 +263,7 @@ func (rw *revisionWatcher) probePodIPs(ready, notReady sets.String) (succeeded s
dest := dest // Standard Go concurrency pattern.
probeGroup.Go(func() error {
ok, notMesh, err := rw.probe(egCtx, dest)
if ok {
if ok && (ready.Has(dest) || rw.enableProbeOptimisation) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that Go prefers fewer parentheses, but I still had to stop and think here for a split second (this is a Scrooge-level nit, albeit of the McDuck variety rather than Ebenezer...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing obvious, but I don't think there's a way to put any more parentheses on this line?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind me... I wasn't looking at the right commit

if err != nil {
return nil, err
}

enableProbeOptimisation := true
if rp := rev.Spec.GetContainer().ReadinessProbe; rp != nil && rp.Exec != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't doublechecked: What's the current state of probes for multi-container services? Are we still only allowing probes on the main container? If so, then this is fine. If not, this'd have to change to check if any other container has any probes on them as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we do indeed disallow probes for multicontainers:

name: "flag enabled: probes are not allowed for non serving containers",

Copy link
Contributor

@markusthoemmes markusthoemmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 12, 2021
@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: julz, markusthoemmes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/autoscale area/networking cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Readiness Problem using handle exec probes in scale from zero
5 participants